I guess I’m a glutton for punishment, but I decided recently to take a trip down memory lane and implement a data structure that I hadn’t used since college, the red-black tree. Before we did into implementation details, let’s review what exactly this data structure is. The red-black tree is a type of binary search tree that is self balancing. This is a key point, because it avoids the worse case scenario for a binary search tree where the nodes are inserted in order. Let’s look at a quick example. Let’s say we have a binary search tree of integers, and we insert 10 nodes in this order:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10
The insertion strategy for a node is to find the correct place in the tree, and insert the node as a right child if it’s larger than the parent, and insert it as a left child if it’s smaller than the parent. The binary search tree in our example would look like this:
Storing the nodes like this does not take advantage of the power of a binary search tree, and this would perform the same as a linked list. If you were to search for 10 in this list, you would have to compare against every node in the tree, which is definitely not optimal. Now consider what a red-black tree would look like after inserting nodes in the same order:
Searching for 10 in this tree can be accomplished in half the comparisons, and this is for a tiny example! The actual performance of the tree for searching is O(log n), where a linked list is O(n). (which is how the binary search tree performs in it’s worse case) The red-black tree is self balancing, meaning that nodes are rotated when insertions or deletions happen to keep the height of the tree as small as possible, giving the tree optimum performance for searching no matter how the nodes are inserted.
OK, so we can see why using a red-black tree can be advantageous, let’s go over specifically what makes a tree qualify as a red-black tree. First, it’s a type of binary search tree, so all of the rules of a binary search tree apply here as well. This means that each node can have at most 2 children, and any nodes in the left sub-tree are less than the parent node, and any nodes in the right sub-tree are greater than the parent node. There are ways of dealing with duplicate nodes, but for this example I’m going to assume duplicates are ignored.
In a red-black tree, each node is assigned a color, obviously red or black. A red node can only have black children. When a node is inserted into the tree, it is colored red. If the parent is also red, you need to modify the tree so this rule is not violated, which often triggers tree rotation, which we will get into later. The next rule is that every path from the root to any leaf has the same amount of black nodes. You can see this enforced in the image above. There are 2 black nodes from the root (4) to any leaf in the tree. This is because the last rule is that the root node is always black, so you don’t need to count it to come up with the black node count to any leaf. The last thing I want to mention is that a lot of implementations have what are called sentinel nodes attached to every leaf, which are in effect 2 black null nodes. My implementation doesn’t use a sentinel node, I just make the children null.
Now that we have the rules, there are a few goals I wanted to accomplish with my C# version of this data structure. First, I wanted to make the data structure generic, so that I can store any type of data I want in my tree. There is a catch that most of you probably see already. In order to accept a data structure, I need to be able to compare it to satisfy the rules of a binary search tree. To do this, I can constrain T so that it must implement IComparable<T>. If I didn’t do this, I would have no way of knowing if 1 node was greater than or less than another node. My second goal is to allow the user to iterate over the nodes to get a list of the nodes in order. For you tree buffs out there, this is essentially an in-order traversal of the nodes. This goal is accomplished by implementing the IEnumerable<T> interface. The last thing I want to do is nest a node class in my tree class, I won’t need to use it outside the tree, so this will be OK. My nested node class will keep track of it’s color, it’s value, and have references to it’s left node, right node, and parent node. I also decided to add some helpful public properties such as IsLeaf and IsRoot.
I also created a NodeColor enumeration so I can keep track of the color of each node and a Direction emumeration to keep track of the parent direction. (so I can keep track of whether the node is a right node or a left one)
After taking all this into consideration, this is the skeleton I came up with:
1: public class RedBlackTree<T> :
2: IEnumerable<T> where T : IComparable<T>
3: {
4:
5: private class RedBlackNode
6: {
7: private readonly T _nodeValue;
8: private RedBlackNode _parentNode;
9: private RedBlackNode _leftNode;
10: private RedBlackNode _rightNode;
11:
12: public NodeColor Color { get; set; }
13:
14: public Direction ParentDirection
15: {
16: get
17: {
18: if (ParentNode == null ||
19: NodeValue.CompareTo(ParentNode.NodeValue) > 0)
20: return Direction.LEFT;
21:
22: return Direction.RIGHT;
23: }
24: }
25:
26: public T NodeValue
27: {
28: get { return _nodeValue; }
29: }
30:
31: public RedBlackNode ParentNode
32: {
33: get { return _parentNode; }
34: set { _parentNode = value; }
35: }
36:
37: public RedBlackNode LeftNode
38: {
39: get { return _leftNode; }
40: set { _leftNode = value; }
41: }
42:
43: public RedBlackNode RightNode
44: {
45: get { return _rightNode; }
46: set { _rightNode = value; }
47: }
48:
49: public Boolean IsRoot
50: {
51: get { return (_parentNode == null); }
52: }
53:
54: public Boolean IsLeaf
55: {
56: get
57: {
58: return (_leftNode == null && _rightNode == null);
59: }
60: }
61:
62:
63: public RedBlackNode(T nodeValue)
64: : this(nodeValue, null, null)
65: {
66:
67: }
68:
69: public RedBlackNode(T nodeValue,
70: RedBlackNode left, RedBlackNode right)
71: {
72: _nodeValue = nodeValue;
73: Color = NodeColor.RED;
74: _leftNode = left;
75: _rightNode = right;
76: _parentNode = null;
77: }
78:
79: public override string ToString()
80: {
81: return _nodeValue.ToString();
82: }
83:
84:
85: }
86: }
87:
88: public enum NodeColor
89: {
90: RED = 0,
91: BLACK = 1
92: }
93:
94: public enum Direction
95: {
96: LEFT = 0,
97: RIGHT = 1
98: }
Now the shell is built, it’s time to start filling out the tree class itself. I’ll add in a nodecount field, since it would be nice to quickly figure out how many nodes are in the tree. As far as nodes, the tree only needs to keep track of the root node. All of the nodes have references the their children, so we can traverse the tree easily by just storing the root node. I can also add in a few nice to have read only properties, IsEmpty, MinValue, and MaxValue. Finding the minimum and maximum values of a binary search tree is trivial. To find the minimum value, start from the root and follow every left child until you get to the leaf node. By definition, this will be the minimum value. The same concept applies for the maximum value, just start from the root and follow the right child. Our tree class now looks like this:
1: public class RedBlackTree<T> : IEnumerable<T>
2: where T : IComparable<T>
3: {
4: private RedBlackNode _rootNode;
5: private int _nodeCount;
6:
7: public T RootNode
8: {
9: get { return _rootNode.NodeValue; }
10: }
11:
12: public int NodeCount
13: {
14: get { return _nodeCount; }
15: }
16:
17: public Boolean IsEmpty
18: {
19: get { return (_rootNode == null); }
20: }
21:
22: public T MinValue
23: {
24: get
25: {
26: // I could have returned default(T) here,
27: // but I don't like that for integer trees.
28: // (returns a 0, which could be misleading)
29: if (IsEmpty)
30: throw new Exception("Error: Cannot determine minimum
31: value of an empty tree");
32:
33: // You can get the min value by traversing left from the root
34: // until you can't any more.
35: var node = _rootNode;
36: while (node.LeftNode != null)
37: node = node.LeftNode;
38:
39: return node.NodeValue;
40: }
41: }
42:
43: public T MaxValue
44: {
45: get
46: {
47: // I could have returned default(T) here,
48: // but I don't like that for integer trees.
49: // (returns a 0, which could be misleading)
50: if (IsEmpty)
51: throw new Exception("Error: Cannot determine maximum
52: value of an empty tree");
53:
54: // You can get the max value by traversing right from the root
55: // until you can't any more.
56: var node = _rootNode;
57: while (node.RightNode != null)
58: node = node.RightNode;
59:
60: return node.NodeValue;
61: }
62: }
63:
64: public RedBlackTree()
65: {
66: _rootNode = null;
67: _debugger = null;
68: }
69:
70: // Removed the nested node class to save space.
71: }
We’re now at the point where we can start inserting nodes into our tree. This is where the red-black tree gets interesting. The tree must be self balancing, meaning when we insert a node we may need to rotate the nodes to keep the height at a minimum height. Remember that when we insert a node into the tree, it is colored red, which could cause violations to the red-black tree rules covered above. I’m going to walk through the different insertion scenarios I came across and show examples of the tree before and after rotation. Before I get started, I have to give some credit to the people who put together this site. It has an interactive demo that allowed me to work through my rotation scenarios which proved to be invaluable.
The first case is the easiest. Obviously if the tree is empty the first node inserted will be made the root node. Since every node is colored red when it is inserted, this violates the rule where the root node needs to be black. All we have to do is change the color to black and we have a valid red-black tree. Another easy insertion case is when the parent of the inserted node is black. In this case the insertion is valid by rule.
The next case occurs when the uncle of the inserted node is also red. In this case the color from the grandparent is pulled down and we check the grandparent. If your head feels like it’s going to explode after reading that don’t worry, I have an illustration to shed some light on it.
In the above case, we start with a valid red black tree with 5 as the root, and 1 and 10 as the children. We insert the node 15 as shown on the left. This causes a violation, since a red node (10 in this case) can’t have a red child (our new node 15). The first thing we check before we worry about rotating the node is whether or not the uncle node is also red. In this case, the uncle node (1) is also red. The procedure for this is to swap the color from the grandparent (5) with the parent and uncle. (nodes 1 and 10) The result of this is the image in the middle. The yellow arrow next to 5 indicates that we have to check this node next. After making a change to the tree, it is important to keep checking nodes up the tree. We don’t want violations to happen up the tree because of a change we made. In this case, we turned the root node of the tree red, which is also a violation. The fix to this is to just turn the root node black, so the finished result of the insert is shown on the right. This is a valid red-black tree since the root is black, there aren’t any red nodes that have red children, and the black node count is the same from anywhere in the tree. This is an easy insertion case because we didn’t have to rotate any nodes.
The next 4 cases cover situations where rotation is necessary. The first example is when you have a red node with a red child on the right and it’s black parent to the left. In this case, the nodes are rotated in a way so that the parent of the inserted node becomes the parent of it’s parent, and colors are swapped so that the black node counts remain valid. Again, it’s way easier to show than to describe, so here is an illustration of this type of rotation:
In this case, we insert the node 3 into our tree. This causes a violation because a red node (2) has a red child (3). The rule from before does not apply, because there is not a red uncle node. In this case, we rotate so that 2 becomes the new root. It’s former parent (1) becomes it’s left child, and 3 remains the right child. Lastly, to make sure our black node count is correct, the black child (1) swaps colors with its new parent (2) to give you the end result on the right. The arrow shows that the next step would be to check 2 to make sure no other violations were introduced, but our tree is valid.
The next case is the inverse of the one above, so I won’t go into too much detail. It occurs when you have a red node with a red child on the left and a black parent on the right, and is illustrated below:
The third rotation type is when a red node has a red left child and a left black parent. This situation is shown here:
In this situation, we are inserting 32 into the tree. This causes a violation of the rules because a red node (35) has a red child (32). This if fixed initially by an inner rotation as shown in the middle step. The parent of the inserted node becomes the child of the inserted node. This doesn’t completely fix our problem however, but it puts us in a situation that we can solve. If you look now, we are presented with the same right child, left parent rotation that we covered above. The result of that rotation is shown on the right, and our tree is now valid.
This rotation also has an inverse where we have a right child with a right parent, and the result of this rotation is shown here:
Now that we have the cases for rotation worked out, it’s time to work this into our class. Rotation was tedious to implement, I had to be really careful when I relinked all of the nodes up to avoid circular references. This was made more complex by the fact that I’m keeping a reference to the parent in addition to each child. The following code handles the insertion of a node and also any necessary rotation or color swapping. (I also added a public method that tells whether or not a node exists in the tree)
1: public void Insert(T value)
2: {
3: if (_rootNode == null)
4: {
5:
6: // In this case we are inserting the root node.
7: var node = new RedBlackNode(value)
8: {ParentNode = null, Color = NodeColor.BLACK};
9: _rootNode = node;
10: _nodeCount++;
11: }
12: else
13: {
14: // The root already exists, so traverse the tree to figure out
15: // where to put the node.
16: InsertNode(value, _rootNode);
17: }
18: }
19:
20: public bool Contains(T value)
21: {
22: if (IsEmpty)
23: return false;
24:
25: var current = _rootNode;
26: while (current != null)
27: {
28: switch (value.CompareTo(current.NodeValue))
29: {
30: case -1:
31: current = current.LeftNode;
32: break;
33: case 1:
34: current = current.RightNode;
35: break;
36: default:
37: return true;
38: }
39: }
40:
41: // The item wasn't found.
42: return false;
43: }
44:
45: private void InsertNode(T value, RedBlackNode current)
46: {
47: if (value.CompareTo(current.NodeValue) == -1)
48: {
49: if (current.LeftNode == null)
50: {
51: var node = new RedBlackNode(value)
52: {
53: Color = NodeColor.RED,
54: ParentNode = current,
55: };
56: current.LeftNode = node;
57: _nodeCount++;
58: }
59: else
60: {
61: InsertNode(value, current.LeftNode);
62: return;
63: }
64: }
65: else if (value.CompareTo(current.NodeValue) == 1)
66: {
67: if (current.RightNode == null)
68: {
69: var node = new RedBlackNode(value)
70: {
71: Color = NodeColor.RED,
72: ParentNode = current,
73: };
74: current.RightNode = node;
75: _nodeCount++;
76: }
77: else
78: {
79: InsertNode(value, current.RightNode);
80: return;
81: }
82: }
83:
84: // Make sure we didn't violate the rules of a red/black tree.
85: CheckNode(current);
86:
87: // Automatically make sure the root node is black.
88: _rootNode.Color = NodeColor.BLACK;
89: }
90:
91: private void CheckNode(RedBlackNode current)
92: {
93: if (current == null)
94: return;
95:
96: if (current.Color != NodeColor.RED) return;
97:
98: var uncleNode = GetSiblingNode(current);
99: if (uncleNode != null && uncleNode.Color == NodeColor.RED)
100: {
101: // Switch colors and then check grandparent.
102: uncleNode.Color = NodeColor.BLACK;
103: current.Color = NodeColor.BLACK;
104: current.ParentNode.Color = NodeColor.RED;
105:
106: // We don't have to check the root node,
107: // I'm just going to turn it black.
108: if (current.ParentNode.ParentNode != null
109: && current.ParentNode.ParentNode.NodeValue.CompareTo(_rootNode.NodeValue) != 0)
110: {
111: var node = current.ParentNode.ParentNode;
112: CheckNode(node);
113: }
114: }
115: else
116: {
117: var redChild =
118: (current.LeftNode != null && current.LeftNode.Color == NodeColor.RED)
119: ? Direction.LEFT : Direction.RIGHT;
120:
121: // Need to rotate, figure out the node and direction for the rotation.
122: // There are 4 scenarios here, left child of right parent,
123: // left child of left parent, right child of right parent,
124: // right child of left parent
125: if (redChild == Direction.LEFT)
126: {
127: if (current.ParentDirection == Direction.RIGHT)
128: {
129: RotateLeftChildRightParent(current);
130: }
131: else
132: {
133: RotateLeftChildLeftParent(current);
134: }
135: }
136: else
137: {
138: // Only do this if the right child is red,
139: // otherwise no rotation is needed.
140: if (current.RightNode.Color == NodeColor.RED)
141: {
142: if (current.ParentDirection == Direction.RIGHT)
143: {
144: RotateRightChildRightParent(current);
145: }
146: else
147: {
148: RotateRightChildLeftParent(current);
149: }
150: }
151: }
152: }
153: }
154:
155: private static RedBlackNode GetSiblingNode(RedBlackNode current)
156: {
157: if (current == null || current.ParentNode == null)
158: return null;
159:
160: if (current.ParentNode.LeftNode != null
161: && current.ParentNode.LeftNode.NodeValue.CompareTo(current.NodeValue) == 0)
162: return current.ParentNode.RightNode;
163:
164: return current.ParentNode.LeftNode;
165: }
166:
167: private void FixChildColors(RedBlackNode current)
168: {
169: // If a node is red, both children must be black,switch if necessary.
170: if (current.Color == NodeColor.RED)
171: {
172: if (current.LeftNode != null
173: && current.LeftNode.Color == NodeColor.BLACK)
174: {
175: current.LeftNode.Color = NodeColor.RED;
176: current.Color = NodeColor.BLACK;
177: }
178: else if (current.RightNode != null
179: && current.RightNode.Color == NodeColor.BLACK)
180: {
181: current.RightNode.Color = NodeColor.RED;
182: current.Color = NodeColor.BLACK;
183: }
184: }
185: }
186:
187: private void RotateRightChildRightParent(RedBlackNode current)
188: {
189: // Don't rotate on the root.
190: if (current.IsRoot)
191: return;
192:
193: var tmpNode = current.RightNode.LeftNode;
194: current.RightNode.ParentNode = current.ParentNode;
195: current.ParentNode.LeftNode = current.RightNode;
196: current.ParentNode = current.RightNode;
197: current.RightNode.LeftNode = current;
198:
199: if (tmpNode != null)
200: {
201: current.RightNode = tmpNode;
202: tmpNode.ParentNode = current;
203: }
204: else
205: {
206: current.RightNode = tmpNode;
207: }
208:
209: // The new node to check is the parent node.
210: var newCurrent = current.ParentNode;
211: CheckNode(newCurrent);
212: }
213:
214: private void RotateLeftChildLeftParent(RedBlackNode current)
215: {
216: // Don't rotate on the root.
217: if (current.IsRoot)
218: return;
219:
220: var tmpNode = current.LeftNode.RightNode;
221: current.LeftNode.ParentNode = current.ParentNode;
222: current.ParentNode.RightNode = current.LeftNode;
223: current.ParentNode = current.LeftNode;
224: current.LeftNode.RightNode = current;
225:
226: if (tmpNode != null)
227: {
228: current.LeftNode = tmpNode;
229: tmpNode.ParentNode = current;
230: }
231: else
232: {
233: current.LeftNode = tmpNode;
234: }
235:
236: // The new node to check is the parent node.
237: var newCurrent = current.ParentNode;
238: CheckNode(newCurrent);
239: }
240:
241: private void RotateLeftChildRightParent(RedBlackNode current)
242: {
243: // Don't rotate on the root.
244: if (current.IsRoot)
245: return;
246:
247: if (current.RightNode != null)
248: {
249: current.ParentNode.LeftNode = current.RightNode;
250: current.RightNode.ParentNode = current.ParentNode;
251: }
252: else
253: {
254: current.ParentNode.LeftNode = current.RightNode;
255: }
256:
257: var tmpNode = current.ParentNode.ParentNode;
258: current.RightNode = current.ParentNode;
259: current.ParentNode.ParentNode = current;
260:
261: if (tmpNode == null)
262: {
263: _rootNode = current;
264: current.ParentNode = null;
265: }
266: else
267: {
268: current.ParentNode = tmpNode;
269:
270: // Make sure we have the pointer from the parent.
271: if (tmpNode.NodeValue.CompareTo(current.NodeValue) > 0)
272: {
273: tmpNode.LeftNode = current;
274: }
275: else
276: {
277: tmpNode.RightNode = current;
278: }
279: }
280:
281: FixChildColors(current);
282:
283: // The new node to check is the parent node.
284: var newCurrent = current.ParentNode;
285: CheckNode(newCurrent);
286: }
287:
288: private void RotateRightChildLeftParent(RedBlackNode current)
289: {
290: // Don't rotate on the root.
291: if (current.IsRoot)
292: return;
293:
294: if (current.LeftNode != null)
295: {
296: current.ParentNode.RightNode = current.LeftNode;
297: current.LeftNode.ParentNode = current.ParentNode;
298: }
299: else
300: {
301: current.ParentNode.RightNode = current.LeftNode;
302: }
303:
304: var tmpNode = current.ParentNode.ParentNode;
305: current.LeftNode = current.ParentNode;
306: current.ParentNode.ParentNode = current;
307:
308: if (tmpNode == null)
309: {
310: _rootNode = current;
311: current.ParentNode = null;
312: }
313: else
314: {
315: current.ParentNode = tmpNode;
316:
317: // Make sure we have the pointer from the parent.
318: if (tmpNode.NodeValue.CompareTo(current.NodeValue) > 0)
319: {
320: tmpNode.LeftNode = current;
321: }
322: else
323: {
324: tmpNode.RightNode = current;
325: }
326: }
327:
328: FixChildColors(current);
329:
330: // The new node to check is the parent node.
331: var newCurrent = current.ParentNode;
332: CheckNode(newCurrent);
333: }
The class can now handle the insertion of any node, and the tree will remain balanced due to our rotation methods. The last thing I want to touch on in this article is traversing the tree. I would like to allow the user to enumerate over the nodes in order, so I enabled IEnumerable<T>. This will be an in order traversal of the tree, which is a fairly simple recursive function. Remember that in an in order tree traversal, You visit the left sub tree, then the root, then the right sub tree recursively. The following code illustrates how to do this:
1: private static IEnumerable<T> InOrderTraversal(RedBlackNode node)
2: {
3: if (node.LeftNode != null)
4: {
5: foreach (T nodeVal in InOrderTraversal(node.LeftNode))
6: yield return nodeVal;
7: }
8:
9: yield return node.NodeValue;
10:
11: if (node.RightNode != null)
12: {
13: foreach (T nodeVal in InOrderTraversal(node.RightNode))
14: yield return nodeVal;
15: }
16: }
17:
18: public IEnumerator<T> GetEnumerator()
19: {
20: foreach (T val in InOrderTraversal(_rootNode)) { yield return val; }
21: }
22:
23: System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
24: {
25: return GetEnumerator();
26: }
In order to test to make sure my traversal was working correctly, I created a sample console application, inserted some random integer values, and then enumerated through them, printing them out.
1: var tree = new RedBlackTree<int>();
2:
3: tree.Insert(10);
4: tree.Insert(25);
5: tree.Insert(7);
6: tree.Insert(3);
7: tree.Insert(19);
8: tree.Insert(42);
9: tree.Insert(1);
10: tree.Insert(14);
11: tree.Insert(31);
12: tree.Insert(13);
13: tree.Insert(2);
14: tree.Insert(9);
15: tree.Insert(17);
16: tree.Insert(6);
17: tree.Insert(11);
18: tree.Insert(18);
19: tree.Insert(26);
20: tree.Insert(16);
21: tree.Insert(27);
22:
23: Console.WriteLine();
24: Console.WriteLine("Testing IEnumerable:");
25: foreach (int val in tree)
26: Console.WriteLine(val.ToString());
Running this produced the expected result, the nodes were printed out to the screen in order:
A red-black tree is a great data structure to use when you have to search a large data set, and it was an interesting journey to remember how to implement a data structure that I hadn’t seen since college. In the next article, I will cover how to implement delete, which can be trickier than insert.
0 comments:
Post a Comment