Wednesday, November 30, 2011

Lecture Notes on Skip Lists


Introduction to AlgorithmsOctober 25, 2005
Massachusetts Institute of Technology6.046J/18.410J
Professors Erik D. Demaine and Charles E. LeisersonHandout 17
Lecture Notes on Skip Lists
Lecture 12 — October 26, 2005
Erik Demaine
Balanced tree structures we know at this point: red-black trees, B-trees, treaps.
Could you implement them right now? Probably, with time. . . but without looking up any details?
Skip lists are a simple randomized structure you’ll never forget.
Starting from scratch
Initial goal: just searches — ignore updates (Insert/Delete) for now
Simplest data structure: linked list
Sorted linked list: (ntime
2 sorted linked lists:
Each element can appear in 1 or both lists
How to speed up search?
Idea: Express and local subway lines
Example: 14 , 23, 34 , 42 , 50, 59, 66, 72 , 79, 86, 96 , 103, 110, 116, 125
(What is this sequence?)
Boxed values are “express” stops; others are normal stops
Can quickly jump from express stop to next express stop, or from any stop to next normal stop
Represented as two linked lists, one for express stops and one for all stops:
1434427296
14 23  34  42  50  59  66  72  79  86  96 103110116125
Every element is in bottom linked list (L2); some elements also in top linked list (L1)
Link equal elements between the two levels
To search, first search in Luntil about to go too far, then go down and search in L2
2    Handout 17: Lecture Notes on Skip Lists
– Cost:  |L2L  n
  L+ 
– Minimized when| 1||L1|1| |L1|
        
     n   
    |L1|L1|   
  =|L1|n   
  =|L1=8   
  n  8
  search cost = 2n
– Resulting 2-level structure:       
sqrt n        
sqrt nsqrt n   sqrt n  sqrt n
• 3 linked lists: · 8n        
• k linked lists: k · 8n        
• lg linked lists: lg n · lg8= lg n · n1lg = (lg n)   
   =2     
– Becomes like a binary tree:
14       79      
14   50   79   110  
14 34 50 66 79 96 110 125
1423344250596672798696103110116125
(In fact, a level-linked B+-tree; see Problem Set 5.)
Example: Search for 72
Level 1: 14 too small, 79 too big; go down 14
Level 2: 14 too small, 50 too small, 79 too big; go down 50Level 3: 50 too small, 66 too small, 79 too big; go down 66Level 4: 66 too small, 72 spot on
Handout 17: Lecture Notes on Skip Lists3
Insert
New element should certainly be added to bottommost level (Invariant: Bottommost list contains all elements)
Which other lists should it be added to?
(Is this the entire balance issue all over again?)
Idea: Flip a coin
With what probability should it go to the next level?
To mimic a balanced binary tree, we’d like half of the elements to advance to the next- to-bottommost level
So, when you insert an element, flip a fair coin
If heads: add element to next level up, and flip another coin (repeat)
Thus, on average:
1/the elements go up 1 level
1/the elements go up 2 levels
1/the elements go up 3 levels
Etc.
Thus, “approximately even”
Example
Get out a real coin and try an example
You should put a special value -at the beginning of each list, and always promote this special value to the highest level of promotion
This forces the leftmost element to be present in every list, which is necessary for searching
.. . many coins are flipped . . .
(Isn’t this easy?)
The result is a skip list.
It probably isn’t as balanced as the ideal configurations drawn above.
It’s clearly good on average.
Claim it’s really really good, almost always.
4Handout 17: Lecture Notes on Skip Lists
Analysis: Claim of With High Probability
Theorem: With high probabilityevery search costs O(lg nin a skip list with elements
What do we need to do to prove this? [Calculate the probability, and show that it’s high!]
We need to define the notion of “with high probability”; this is a powerful technical notion, used throughout randomized algorithms
Informal definition: An event occurs with high probability if, for any → 1, there is an appropriate choice of constants for which occurs with probability at least - O(1/n )
In reality, the constant hidden within O(lg nin the theorem statement actually depends on c.
Precise definition: A (parameterized) event occurs with high probability if, for any1occurs with probability at least - c /n , where is a “constant” depending only on .
The term O(1/n or more precisely c /n is called the error probability
The idea is that the error probability can be made very very very small by setting to something big, e.g., 100
Analysis: Warmup
Lemma: With high probability, skip list with elements has O(lg nlevels
(In fact, the number of levels is (log n), but we only need an upper bound.)
Proof:
– Pr{element is in more than lg levels= 1/2lg = 1/nc
– Recall Boole’s inequality / union bound:
Pr{E= E= · · · = EPr{E1+ Pr{E2· · · + Pr{Ek}
Applying this inequality:
Pr{any element is in more than lg levels} n · 1/n= 1/nc-1
Thus, error probability is polynomially small and exponent ( c - 1) can be made arbitrarily large by appropriate choice of constant in level bound of O(lg n)
Handout 17: Lecture Notes on Skip Lists5
Analysis: Proof of Theorem
Cool idea: Analyze search backwards—from leaf to root
Search starts at leaf (element in bottommost level)
At each node visited:
If node wasn’t promoted higher (got TAILS here), then we go [came from] leftIf node was promoted higher (got HEADS here), then we go [came from] up
Search stops at root of tree
Know height is O(lg nwith high probability; say it’s lg n
Thus, the number of “up” moves is at most lg with high probability
Thus, search cost is at most the following quantity:
How many times do we need to flip a coin to get lg heads?
• Intuitively, (lg n)
Analysis: Coin Flipping
Claim: Number of flips till lg heads is (lg nwith high probability
Again, constant in (lg nbound will depend on
Proof of claim:
Say we make 10lg flips
When are there at least lg heads?
lg 9lg n
Pr{exactly lg headslg · · 2
HHHTTTordersvsHTHTHT heads  tails 
           
– Pr{at most lg heads10cclglgn·21 9lg n    
             
 overestimateon orders tails     
              
           
– Recall bounds on x: xy xe xy x  
           
6Handout 17: Lecture Notes on Skip Lists
– Applying this formula to the previous equation:
Pr{at most lg heads} 10cclglgnn29lg n  
  e ·c10lgcnlg lg · 21 9lg n
      
 =(10e)lg ·29lg n  
 =2lg(10e)·c lg · 29lg n  
 =2(lg(10e)-9)lg n  
 =2lg n   
 =1/n   
The point here is that, as 10 *= 9 lg(10e*, independent of (for all) c
End of proof of claim and theorem
Acknowledgments
This lecture is based on discussions with Michael Bender at SUNY Stony Brook.