登录查看更多内容

Mastering HashSet in C++: Unraveling the Power of unordered_set

RISHABH SINGH

Actively looking for Full-time Opportunities in AI/ML/Robotics | Ex-Algorithms & ML Engineer @ Dynocardia Inc | Computer Vision Research Assistant & Robotics Graduate Student @Northeastern University

发布日期: 2024年9月21日

In C++, the term “HashSet” is often confused with unordered_set, but they are essentially the same thing. C++ does not have a direct HashSet class like in some other programming languages (e.g., Java), but the unordered_set class serves the same purpose. Let's dive into the differences between unordered_set and set, and learn how to use unordered_set (or "HashSet") in C++ with examples.

What is unordered_set in?C++?

The unordered_set in C++ is a container that stores unique elements, just like a mathematical set. The key difference from the set is that unordered_set does not maintain any order of the elements. It stores elements in an arbitrary order, and it uses hashing to achieve faster insertion, deletion, and lookup times.

Time Complexity:

unordered_set: O(1) average for insert, delete, and search.
set: O(log n) due to the internal use of binary search trees (like Red-Black Trees).

Key Properties of unordered_set:

Unique Elements: It only stores unique elements, meaning no duplicates.
Hashing: It uses a hash table for storing elements, providing average constant time complexity (O(1)) for lookups, insertions, and deletions.
No Specific Order: Elements are not stored in any particular order.
Faster Operations: Due to the use of hashing, unordered_set is usually faster than set for most operations.

Difference Between set and unordered_set

Order:

set: Elements are ordered based on their values.
unordered_set: Elements are stored arbitrarily, based on their hash values.

Underlying Data Structure:

set: Uses a balanced binary search tree (usually a Red-Black Tree).
unordered_set: Uses a hash table.

Using unordered_set in C++: A Step-by-Step Guide

#include<bits/stdc++.h>
#include<unordered_set>

using namespace std; 

int main(){
  // Declare an unordered_set of integers
  unordered_set<int> hashSet;
  
  // Inserting elements
  hashSet.insert(10);
  hashSet.insert(20);
  hashSet.insert(30);
  
  // Duplicate elements are ignored
  hashSet.insert(10);
  
  // Print the elements (order isd not guaranteed)
  cout << "Element in Hashset: ";
  for(const auto &element : hashSet){
    cout<< element<< " ";
  }
  cout<<endl;

  // Checking if an element exist
  int value = 20;
  if(hashSet.find(value) != hashSet.end()){
    cout<<value<<" is present in the hashset."<<endl;
  }
  else{
    cout<<value<<" is not present in the hashset."<<endl;
  }
  
  // erasing an element
  hashSet.erase(10);

  // Size of the unordered_set
  cout<<"Size of hashSet: "<< hashSet.size() << endl;
  return 0;
}

Now, let’s try to solve some LeetCode questions to get more clear understanding of unordered_set and set.

Example1. Unique Number of Occurrences

Given an array of integers arr, return true if the number of occurrences of each value in the array is unique or false otherwise.

领英推荐

Using Go Generics

Colin Wilcox MBA 5 个月前

Go, C, Rust, and More: Picking the Right eBPF…

Teodor Podobnik 1 个月前

Comparing Rust, C++, Python, Java, Go, and…

Juliano Souza 5 个月前

Example 1:
Input: arr = [1,2,2,1,1,3]
Output: true
Explanation: The value 1 has 3 occurrences, 2 has 2 and 3 has 1. 
No two values have the same number of occurrences.

Example 2:
Input: arr = [1,2]
Output: false

Example 3:
Input: arr = [-3,0,1,-3,1,1,1,-3,10,0]
Output: true

class Solution {
public:
    bool uniqueOccurrences(vector<int>& arr) {
        // Create an unordered_map to store the frequency of each element in arr
        unordered_map<int, int> umap; 
        
        // Step 1: Count the occurrences of each element in the array
        for(int i = 0; i < arr.size(); i++) {
            // Increment the frequency of arr[i] in the map
            umap[arr[i]]++;  
        }
        
        // Create a set to store unique frequencies of the elements
        set<int> freq;  

        // Step 2: Check if all frequencies are unique
        // Iterate through each (key, value) pair in the unordered_map
        for(auto it : umap) {  
            if(freq.find(it.second) != freq.end()) {
                // If the frequency is already in the set, return false  
                return false;
            }
            // Insert the frequency into the set
            freq.insert(it.second);  
        }
        // If no duplicate frequencies were found, return true
        return true;  
    }
};

// Time Complexity: O(n log n) in the worst case
// Space Complexity: O(n)

Example2. Valid Sudoku

Determine if a 9 x 9 Sudoku board is valid. Only the filled cells need to be validated according to the following rules:

Each row must contain the digits 1-9 without repetition.
Each column must contain the digits 1-9 without repetition.
Each of the nine 3 x 3 sub-boxes of the grid must contain the digits 1-9 without repetition.

Note:

A Sudoku board (partially filled) could be valid but is not necessarily solvable.
Only the filled cells need to be validated according to the mentioned rules.

Example 1:
Input: board = 
[["5","3",".",".","7",".",".",".","."]
,["6",".",".","1","9","5",".",".","."]
,[".","9","8",".",".",".",".","6","."]
,["8",".",".",".","6",".",".",".","3"]
,["4",".",".","8",".","3",".",".","1"]
,["7",".",".",".","2",".",".",".","6"]
,[".","6",".",".",".",".","2","8","."]
,[".",".",".","4","1","9",".",".","5"]
,[".",".",".",".","8",".",".","7","9"]]
Output: true

Example 2:
Input: board = 
[["8","3",".",".","7",".",".",".","."]
,["6",".",".","1","9","5",".",".","."]
,[".","9","8",".",".",".",".","6","."]
,["8",".",".",".","6",".",".",".","3"]
,["4",".",".","8",".","3",".",".","1"]
,["7",".",".",".","2",".",".",".","6"]
,[".","6",".",".",".",".","2","8","."]
,[".",".",".","4","1","9",".",".","5"]
,[".",".",".",".","8",".",".","7","9"]]
Output: false
Explanation: Same as Example 1, except with the 5 in the top left corner 
being modified to 8. Since there are two 8's in the top left 3x3 sub-box, 
it is invalid.

class Solution {
public:
    bool isValidSudoku(vector<vector<char>>& board) {
        int m = board.size(); // rows
        int n = board[0].size(); // columns

        unordered_set<string> hashSet;
        for(int i = 0; i < m; i++){
            for(int j = 0; j < n; j++){
                if(board[i][j] != '.'){
                    // Construct unique identifiers for rows, columns, and boxes
                    string rowStr = string(1, board[i][j]) + " in row " + to_string(i);
                    string colStr = string(1, board[i][j]) + " in col " + to_string(j);  // Corrected column index
                    string boxStr = string(1, board[i][j]) + " in box " + to_string(i / 3) + "-" + to_string(j / 3);

                    // Check if the element already exists in the row, column, or box
                    if(hashSet.find(rowStr) != hashSet.end() || hashSet.find(colStr) != hashSet.end() || hashSet.find(boxStr) != hashSet.end()){
                        return false;  // Sudoku rule violation
                    }

                    // Insert the current element's row, column, and box information into the set
                    hashSet.insert(rowStr);
                    hashSet.insert(colStr);
                    hashSet.insert(boxStr);
                }
            }
        }
        return true;
    }
};

Most Important Logic to notice here is that, we create three strings to uniquely identify each value in the board with respect to:

The row it’s in: rowStr.
The column it’s in: colStr.
The 3x3 sub-box it’s in: boxStr.

string(1, board[i][j]) creates a string of length 1, containing the character at board[i][j] (the number in the cell).
to_string(i) converts the row index i to a string, and to_string(j) does the same for the column index j.
The string boxStr identifies the 3x3 sub-box, calculated by dividing the row and column indices by 3 (i / 3 and j / 3). This helps group the grid into the 3x3 sections required by Sudoku rules.

For example:

If board[0][0] contains '5', the strings will be:
rowStr = "5 in row 0",
colStr = "5 in col 0",
boxStr = "5 in box 0-0"

Then check if any of the three strings (rowStr, colStr, boxStr) already exist in the set hashSet

Time and Space Complexity:

Time Complexity: The solution iterates through all 81 cells in the 9x9 Sudoku board, and for each non-empty cell, it performs constant-time operations (hashing, finding, and inserting into the unordered_set). Therefore, the overall time complexity is O(81) = O(1), since the board size is fixed at 9x9.
Space Complexity: The space complexity is O(81) = O(1) as well, since in the worst case, the unordered_set will store at most 81 strings (one for each unique digit placed in a row, column, and 3x3 box).

要查看或添加评论，请登录

RISHABH SINGH的更多文章

Classification Measures in Machine Learning

2025年2月24日

Classification Measures in Machine Learning

In classification problems, it’s crucial to have effective measures to evaluate how well our model is performing…

2 条评论
Regularization in Machine Learning

2024年11月17日

Regularization in Machine Learning

Regularization is a technique used in machine learning to prevent overfitting, which occurs when a model learns the…
Logistic Regression

2024年11月4日

Logistic Regression

Logistic Regression is one of the most fundamental algorithms in Machine Learning and is primarily used for…
Why Logistic Regression Beats Linear Regression for Classification

2024年10月29日

Why Logistic Regression Beats Linear Regression for Classification

In machine learning, there are two main types of tasks: regression and classification. Linear Regression is designed…

2 条评论
Introduction to Machine Learning

2024年10月26日

Introduction to Machine Learning

Machine Learning (ML) is a branch of artificial intelligence (AI) that allows computers to learn and make predictions…
Statistics for Machine Learning

2024年10月24日

Statistics for Machine Learning

Statistics is described as a collection of tools and methods used to derive meaningful insights by performing…

4 条评论
Sliding Window Technique Simplified (C++)

2024年10月4日

Sliding Window Technique Simplified (C++)

The Sliding Window Technique is a powerful method to solve problems involving arrays or strings. It optimizes problems…
Natural Language Processing: Linear Text Classification

2024年9月29日

Natural Language Processing: Linear Text Classification

Linear classification refers to using a straight line (or hyperplane in higher dimensions) to separate different…
Mastering Arrays & Pointers (C++): Learning Basics to Solving Top Interview Questions (Part-2)

2024年9月25日

Mastering Arrays & Pointers (C++): Learning Basics to Solving Top Interview Questions (Part-2)

Ques.1 Remove Even Integers from Array Given an array of integers, arr, remove all the even integers from the array.

1 条评论
Introduction to Natural Language Processing: Byte Pair Encoding (BPE) and Natural Language Toolkit (NLTK)

2024年9月23日

Introduction to Natural Language Processing: Byte Pair Encoding (BPE) and Natural Language Toolkit (NLTK)

NLP is a subfield of artificial intelligence (AI) and computational linguistics. It focuses on enabling computers to…

2 条评论

See all articles

Mastering HashSet in C++: Unraveling the Power of unordered_set

RISHABH SINGH

Actively looking for Full-time Opportunities in AI/ML/Robotics | Ex-Algorithms & ML Engineer @ Dynocardia Inc | Computer Vision Research Assistant & Robotics Graduate Student @Northeastern University

What is unordered_set in?C++?

Key Properties of unordered_set:

Difference Between set and unordered_set

Using unordered_set in C++: A Step-by-Step Guide

领英推荐

Time and Space Complexity:

RISHABH SINGH的更多文章

社区洞察

其他会员也浏览了

Comparing Rust, C++, Python, Java, Go, and TypeScript/Node.js for Low-Latency, HFT, and Trading Applications

Go/Golang 101 - The Syntax and Basics

Why Elixir is my favorite programming language

Best Programming Languages to Learn in 2020

C++ Core Guidelines: Non-Rules and Myths

"Expertly crafting your first REST API in the language of your choice - guaranteed simplicity and cleanliness!"

C++ Core Guidelines: Rules for Variadic Templates

Rust Ownership 101

Dart: Mixins

What is unordered_set in?C++?

Key Properties of unordered_set:

Difference Between set and unordered_set

Using unordered_set in C++: A Step-by-Step Guide

领英推荐

Time and Space Complexity:

RISHABH SINGH的更多文章

Classification Measures in Machine Learning

Regularization in Machine Learning

Logistic Regression

Why Logistic Regression Beats Linear Regression for Classification

Introduction to Machine Learning

Statistics for Machine Learning

Sliding Window Technique Simplified (C++)

Natural Language Processing: Linear Text Classification

Mastering Arrays & Pointers (C++): Learning Basics to Solving Top Interview Questions (Part-2)

Introduction to Natural Language Processing: Byte Pair Encoding (BPE) and Natural Language Toolkit (NLTK)

社区洞察

其他会员也浏览了

Comparing Rust, C++, Python, Java, Go, and TypeScript/Node.js for Low-Latency, HFT, and Trading Applications

Go/Golang 101 - The Syntax and Basics

Why Elixir is my favorite programming language

Best Programming Languages to Learn in 2020

C++ Core Guidelines: Non-Rules and Myths

"Expertly crafting your first REST API in the language of your choice - guaranteed simplicity and cleanliness!"

C++ Core Guidelines: Rules for Variadic Templates

Rust Ownership 101

Dart: Mixins