Do you know what are the differences between hash join and merge join in SQL?
I am not sure if I understand it right. But to me, the big idea of hash join is to use a temporary hash table as a “lookup index”. The inner or target table has column values that pass any previous filters put into the hash table, and column values from the outer table are looked up in the hash table to see if they match. If they do, the join succeeds. The biggest restriction for me on hash joins is the native restrictions on Hash tables: a hash join can only be used for equality lookups, and not ranges or BETWEEN.
Also, I think that while hash joins are reasonably efficient, they’re rarely more efficient than using a native index to execute the join, so hash joins are usually used for “derived” joins where the working set has already been filtered by some other predicate or possibly a table scan that would otherwise be answered using a Nested loop join.
Hash tables may also be used to do stuff like answer IN.
On the other hand, a merge-join is able to treat the columns in the inner and outer tables as sorted lists, either by using indexes or by actively sorting them. A Merge algorithm is then used to figure out which rows in the joined tables or expressions satisfy the join criterion.
Merge joins are often called merge-sort joins as they are very similar to the steps in Merge sort. Merge joins can be used on equality as well as ranged operators such as <, >, or BETWEEN.
Any ideas?