CLEAN CODE: HEAP, GREAT FOR SOLVING INTERVIEW QUESTIONS INVOLVING: FIND TOP FIVE, TOP N, BOTTOM N ETC

By: Saurav

2017-12-25 03:45:00 UTC

I am working on coming up with a pattern or a process flow to follow in an interview. One of the haunting data structure for me, as well as many folks, is a heap. Max heap or Min heap. I have faced interview problems like

1. Find top n words in a sentence
2. Find k max elements in an array
3. Find k max elements according to repetitions

and yes, I wasn't able to solve it in the best way at that time and it hurts till today.

So I thought let's spend time with the beast, get a good refresher and try to tame it.

A good tutorial is on youtube by Gayle Laakmann McDowell herself. Its great if you are in the flow of solving problems but at first its a bit complex. Another great resource is at Sitepoint. That is really easy to understand and is a great work.

Heap

A heap is essentially a binary tree, which follows a specific pattern. If the parent of two nodes is smaller than the nodes, and this property holds for the whole tree, it's a min heap. If the parent is greater than both, it's a max heap.

In a heap, the root will be the maximum or minimum among all the elements of the tree. However, if the elements are depicted as in array, it won't be sorted. So, keep that in mind.

It's great if we want to get the top n or bottom n elements with restricted space. The time complexity of converting an array into the heap is O(n) (Although it looks like O(nlogn), but for further reading view this article )).

We can implement a heap by using Node class but a better way from space efficiency is using an array. Elements aread from top to bottom, left to right.

For the heap binary tree to array conversion:

For an even and odd number of elements:

def left_child_index(index)
    if input_array.size%2 == 0
      2*index
    else
      2*index +1 
    end
end 
  

The function above gives the position of an element's left child in an array.

 def right_child_index(index)
    if input_array.size%2 == 0
      2*index +1
    else
      2*index +2 
    end
  end 

As always. lets write some tests first:

new_heap = Heap.new([2,4,1,7,0,3,8])
p new_heap.build_heap == [8, 7, 3, 4, 0, 2, 1]

new_heap1 = Heap.new([10,4,8,7,0,12,2,8])
p new_heap1.build_heap == [12, 10, 8, 8, 0, 4, 2, 7]

new_heap1 = Heap.new([])
p new_heap1.build_heap == []

Implementing the Heap class:

class Heap 
  attr_accessor :input_array
  
  def initialize(input_array)
    @input_array = input_array
  end 
  
  def left_child_index(index)
    if input_array.size%2 == 0
      2*index
    else
      2*index +1 
    end
  end 
  
  def right_child_index(index)
    if input_array.size%2 == 0
      2*index +1
    else
      2*index +2 
    end
  end 
  
  def is_greater_than_children(index)
    input_array[index] >= input_array[left_child_index(index)] and input_array[index] >= input_array[right_child_index(index)] ? true : false
  end 
  
  def is_a_leaf_node(index)
    (index >= input_array.size/2) ? true : false 
  end 
  

  
  def max_heapify(index)
    
    return if is_a_leaf_node(index) or is_greater_than_children(index)
    
    
    largest_index = input_array[left_child_index(index)] > input_array[right_child_index(index)] ? left_child_index(index) : right_child_index(index)
    
    
    if input_array[index] < input_array[largest_index]
      input_array = swap(index,largest_index)
    end 
    
    max_heapify(largest_index)

  end 
  
  def swap(index,largest)
    temp = input_array[index]
    input_array[index] = input_array[largest]
    input_array[largest] = temp 
    return input_array
  end 
  
  def build_heap
    (input_array.size/2 - 1).downto(0).each do |index|
      
      max_heapify(index)
    end
    return input_array
  end 
  
  def get_max_n_numbers(n)
    return input_array if n >= input_array.size
    top_n = Array.new
    
    (0..n-1).each do 
      build_heap
      top_n << input_array.shift
    end 
    
    return top_n
  end 
end 

We need the build_heap function to run heapify function above. We only need to build from the parent of the leaf nodes so it starts from input.size/2-1.

I have also added a get_max_n_numbers function which can give n max elements in the heap.

This is always the core of the questions related to the heap.

Remember! When the interview question asks for:
Find top n elements / find winner / find max / find min / find most repetitive / find least repetitive / find top customer etc.
Only two things should come to your mind.
> Heap or sort. Heap will take O(n) while sort will take at least O(nlogn). Depending upon the heap size you can control the time complexity of the heap operations O(klogk).

In few questions where there is repetition (like finding most repetitive), Hash also comes in handy along with heap.

Owned & Maintained by Saurav Prakash

If you like what you see, you can help me cover server costs or buy me a cup of coffee though donation :)