By: Saurav
2017-12-25 03:45:00 UTC
I am working on coming up with a pattern or a process flow to follow in an interview. One of the haunting data structure for me, as well as many folks, is a heap. Max heap or Min heap. I have faced interview problems like
1. Find top n words in a sentence
2. Find k max elements in an array
3. Find k max elements according to repetitions
and yes, I wasn't able to solve it in the best way at that time and it hurts till today.
So I thought let's spend time with the beast, get a good refresher and try to tame it.
A good tutorial is on youtube by Gayle Laakmann McDowell herself. Its great if you are in the flow of solving problems but at first its a bit complex. Another great resource is at Sitepoint. That is really easy to understand and is a great work.
A heap is essentially a binary tree, which follows a specific pattern. If the parent of two nodes is smaller than the nodes, and this property holds for the whole tree, it's a min heap. If the parent is greater than both, it's a max heap.
In a heap, the root will be the maximum or minimum among all the elements of the tree. However, if the elements are depicted as in array, it won't be sorted. So, keep that in mind.
It's great if we want to get the top n or bottom n elements with restricted space. The time complexity of converting an array into the heap is O(n) (Although it looks like O(nlogn), but for further reading view this article )).
We can implement a heap by using Node class but a better way from space efficiency is using an array. Elements aread from top to bottom, left to right.
For the heap binary tree to array conversion:
For an even and odd number of elements:
def left_child_index(index) if input_array.size%2 == 0 2*index else 2*index +1 end end
The function above gives the position of an element's left child in an array.
def right_child_index(index) if input_array.size%2 == 0 2*index +1 else 2*index +2 end end
As always. lets write some tests first:
new_heap = Heap.new([2,4,1,7,0,3,8]) p new_heap.build_heap == [8, 7, 3, 4, 0, 2, 1] new_heap1 = Heap.new([10,4,8,7,0,12,2,8]) p new_heap1.build_heap == [12, 10, 8, 8, 0, 4, 2, 7] new_heap1 = Heap.new([]) p new_heap1.build_heap == []
Implementing the Heap class:
class Heap attr_accessor :input_array def initialize(input_array) @input_array = input_array end def left_child_index(index) if input_array.size%2 == 0 2*index else 2*index +1 end end def right_child_index(index) if input_array.size%2 == 0 2*index +1 else 2*index +2 end end def is_greater_than_children(index) input_array[index] >= input_array[left_child_index(index)] and input_array[index] >= input_array[right_child_index(index)] ? true : false end def is_a_leaf_node(index) (index >= input_array.size/2) ? true : false end def max_heapify(index) return if is_a_leaf_node(index) or is_greater_than_children(index) largest_index = input_array[left_child_index(index)] > input_array[right_child_index(index)] ? left_child_index(index) : right_child_index(index) if input_array[index] < input_array[largest_index] input_array = swap(index,largest_index) end max_heapify(largest_index) end def swap(index,largest) temp = input_array[index] input_array[index] = input_array[largest] input_array[largest] = temp return input_array end def build_heap (input_array.size/2 - 1).downto(0).each do |index| max_heapify(index) end return input_array end def get_max_n_numbers(n) return input_array if n >= input_array.size top_n = Array.new (0..n-1).each do build_heap top_n << input_array.shift end return top_n end end
We need the build_heap function to run heapify function above. We only need to build from the parent of the leaf nodes so it starts from input.size/2-1.
I have also added a get_max_n_numbers function which can give n max elements in the heap.
This is always the core of the questions related to the heap.
Remember! When the interview question asks for:
Find top n elements / find winner / find max / find min / find most repetitive / find least repetitive / find top customer etc.
Only two things should come to your mind.
> Heap or sort. Heap will take O(n) while sort will take at least O(nlogn). Depending upon the heap size you can control the time complexity of the heap operations O(klogk).
In few questions where there is repetition (like finding most repetitive), Hash also comes in handy along with heap.
Owned & Maintained by Saurav Prakash
If you like what you see, you can help me cover server costs or buy me a cup of coffee though donation :)