Hashing Algorithm
Hashing MCQs for GATE (with answers & solutions)
- Which property of a good hash function reduces clustering and gives uniform distribution?
A) Determinism
B) Large key range but poor mixing of bits
C) Avalanche effect (small change in key → large change in hash)
D) Simplicity (use key mod table size)
Answer: C
Solution: Avalanche effect spreads nearby keys across table, reducing clustering. Determinism is required but not sufficient.
- If a hash table has 1000 slots and 700 keys inserted using chaining, the load factor α = n/m equals:
A) 0.7
B) 7000
C) 1.43
D) 0.07
Answer: A
Solution: α = n/m = 700/1000 = 0.7.
- For separate chaining, expected search time (successful) is Θ(1 + α). If α = 2, expected successful search cost is:
A) Θ(1)
B) Θ(2)
C) Θ(3)
D) Θ(0.5)
Answer: C
Solution: Θ(1 + α) = Θ(1 + 2) = Θ(3) (constant time proportional to 3).
- Which collision-resolution method guarantees probing all table slots if table size m is prime and step size is h2(k) = 1 + (k mod (m−1))?
A) Linear probing
B) Quadratic probing
C) Double hashing with given h2
D) Separate chaining
Answer: C
Solution: Double hashing with h2 coprime to m (here 1+(k mod m−1) ∈ [1,m−1]) ensures full cycle.
- Which of these is a primary disadvantage of linear probing?
A) Secondary clustering
B) Primary clustering
C) Requires extra memory per entry
D) Slow modulus operation
Answer: B
Solution: Linear probing causes primary clustering — long contiguous runs of occupied slots.
- Double hashing reduces which kind of clustering?
A) Primary clustering only
B) Secondary clustering only
C) Both primary and secondary clustering
D) Neither
Answer: B
Solution: Double hashing avoids secondary clustering (different keys with same initial probe behave differently) and reduces primary clustering compared to linear probing.
- In open addressing, if load factor α approaches 1, expected unsuccessful search cost tends to:
A) O(1)
B) O(log n)
C) O(1/(1−α)) (blows up)
D) O(n) always
Answer: C
Solution: Probing costs grow like ≈ 1/(1−α) for many open-addressing schemes.
- A hash table uses chaining with m = 500 buckets and n = 2500 keys. Average chain length is:
A) 0.2
B) 5
C) 2500
D) 500
Answer: B
Solution: Average chain length = n/m = 2500/500 = 5.
- Which hash function is best when keys are consecutive integers?
A) h(k) = k mod 2^t
B) h(k) = k mod prime p near table size
C) h(k) = lower bits of k
D) h(k) = constant 0
Answer: B
Solution: Using a prime modulus near table size produces better mixing; power-of-two mod uses low bits which vary little for consecutive integers.
- Which technique is used to avoid deletions breaking search in open addressing?
A) Rehash entire table on deletion
B) Mark slot as “deleted” (tombstone)
C) Physically remove and shift entries left
D) Do nothing
Answer: B
Solution: Tombstones allow probes to continue; periodic rehash cleans up.
- In a hash table with separate chaining, worst-case search time is:
A) Θ(1)
B) Θ(α)
C) Θ(n)
D) Θ(log n)
Answer: C
Solution: All keys could hash to same bucket → O(n) scan of chain.
- Which of these is a universal hashing guarantee?
A) Zero collisions for any input
B) Expected small collision probability for any fixed key pair under random choice of hash function family
C) Deterministic best distribution for keys
D) Requires huge table space
Answer: B
Solution: Universal hashing chooses function randomly from a family so collision probability between any two distinct keys is low (≤ 1/m).
- If you use a cryptographic hash like SHA-256 for an in-memory hash table of 10^6 items, main drawback is:
A) Too many collisions
B) Slow hashing cost per operation
C) Not deterministic
D) Not suitable for integers
Answer: B
Solution: Cryptographic hashes are computationally expensive; they produce excellent distribution but slower than simple hashes.
- For closed addressing (chaining), which resizing policy helps keep expected operations O(1)?
A) Never resize
B) Resize (expand) when α exceeds a threshold (e.g., 1)
C) Always rebuild table on every insert
D) Decrease m as n grows
Answer: B
Solution: Rehash into larger table when load factor too high keeps α = O(1).
- Which hash-table operation is easiest to parallelize?
A) Insert in open addressing with linear probing
B) Insert in separate chaining (locks per bucket)
C) Delete in open addressing (with tombstones)
D) Rehashing entire table
Answer: B
Solution: Per-bucket locks in chaining allow independent inserts in many buckets concurrently.
- In a cuckoo-hash with two hash functions h1,h2 and table size m, insertion can fail when:
A) Load factor less than 0.1
B) An infinite loop of displacements occurs (cycle)
C) Table contains duplicates only
D) Hash functions are identical
Answer: B
Solution: Cycles of displacement indicate insertion failure; usually resolved by rehashing.
- Which hashing scheme guarantees O(1) worst-case lookup for static sets?
A) Chaining with random hash
B) Perfect hashing (two-level)
C) Linear probing
D) Cuckoo hashing
Answer: B
Solution: Minimal perfect hashing for static set maps keys to unique slots with O(1) worst-case lookup.
- Suppose you use modulo hashing with m = 100 (not prime). Keys are all even numbers. Which problem occurs?
A) Keys spread uniformly
B) All keys map to odd buckets
C) All keys map only to even bucket indices → clustering
D) Keys map to random buckets
Answer: C
Solution: If m is even, mapping k mod m for even k gives only even bucket indices → halves effective table.
- Which probing sequence avoids primary clustering?
A) i, i+1, i+2, …
B) i, i+1^2, i+2^2, …
C) i, i + c1 + c2k, … (linear) D) i, i + h2(k), i + 2h2(k) … (double hashing)
Answer: D
Solution: Double hashing uses varying step for different keys, reducing clustering.
- If a Bloom filter uses m bits, k hash functions, and n inserted items, approximate probability that a given bit remains 0 after insertions is:
A) (1 − 1/m)^{kn} ≈ e^{−kn/m}
B) 1 − e^{−kn/m}
C) (1 − kn/m)
D) 1/m^{kn}
Answer: A
Solution: Each hash sets a bit; probability a single hash leaves bit 0 = 1 − 1/m; k n independent approx → (1−1/m)^{kn} ≈ e^{−kn/m}.
- Using result from previous, Bloom filter false positive probability ≈:
A) (1 − e^{−kn/m})^k
B) e^{−kn/m}
C) k/m
D) 1 − (1 − e^{−kn/m})^k
Answer: A
Solution: False positive when all k positions are 1: p_fp = (1 − prob bit zero)^k.
- A Bloom filter has m = 10,000 bits and expects n = 1000 inserts. Optimal k (number of hashes) approximates:
A) (m/n) ln 2 ≈ (10) ln2 ≈ 6.93 → 7
B) m/n
C) ln m
D) 1
Answer: A
Solution: Optimal k ≈ (m/n) ln 2 ≈ 10 * 0.693 ≈ 6.93 ≈ 7.
- Given perfect hash exists for static set size n = 100 into table of size m = 100, lookup time worst-case is:
A) O(log n)
B) O(1)
C) O(n)
D) O(√n)
Answer: B
Solution: Perfect hashing provides direct mapping with constant-time lookup.
- If you use h(k) = ((a*k + b) mod p) mod m (multiplicative hashing with prime p > key range), this is an instance of:
A) Division method
B) Multiplicative method with modular arithmetic (universal hashing family)
C) Cryptographic hashing
D) Polynomial hashing only for strings
Answer: B
Solution: The linear congruential style modulo prime is a universal hashing family.
- Open addressing with quadratic probing i^2 (i = probe count) can fail to find empty slot even if table not full unless m is:
A) prime
B) power of two
C) large enough and certain properties satisfied (e.g., m prime or probing formula chosen appropriately)
D) odd number
Answer: C
Solution: Quadratic probing may cycle over subset; careful choice of m or probe coefficients ensures full coverage.
- In hashing strings, rolling hash (as in Rabin-Karp) is useful because:
A) It gives zero collisions
B) It allows O(1) update when window slides by one character
C) It is cryptographically secure
D) It is perfect hash
Answer: B
Solution: Rolling hash supports efficient recomputation for sliding windows.
- Rabin-Karp average-case time for single pattern of length m in text length n is:
A) O(n + m) expected
B) O(nm) always
C) O(n log m)
D) O(1)
Answer: A
Solution: Rolling hash reduces per-position comparison expected to O(1), giving O(n+m).
- If you use double hashing with h2(k)=0 for some key, what happens?
A) Perfect hashing
B) Probe step becomes zero → infinite loop on collision
C) Works fine
D) Table size doubles automatically
Answer: B
Solution: Step size zero means probe i always returns same index; must ensure h2(k) ≠ 0 (often h2 defined as 1 + (k mod (m−1))).
- Suppose 2000 keys hashed into 1000 buckets with chaining; expected maximum bucket length (rough estimate) is about O( (ln n)/(ln ln n) )? For n=2000, m=1000, n/m=2, so largest chain roughly:
A) small constant like 6–10
B) n (2000)
C) ~1000
D) ~200
Answer: A
Solution: With random hashing, maximum load ≈ (ln n)/(ln ln n) ~ small constant for moderate n; practical maximum chain length modest.
- Which technique supports deletion without tombstones and without rehashing every delete?
A) Separate chaining (simply remove from list)
B) Linear probing with shifting cluster elements left (costly)
C) Cuckoo hashing with relocation and possible rehash on failure
D) All of the above, but A is simplest
Answer: D (A is simplest)
Solution: Chaining allows straightforward deletion; open addressing often requires tombstones or expensive reshuffling.
- In Robin Hood hashing, when inserting a key with a probe distance larger than occupant’s, we:
A) Leave as is
B) Swap the keys so the one with larger probe distance steals slot (balancing)
C) Rehash entire table
D) Use chaining
Answer: B
Solution: Robin Hood swaps to minimize variance in probe distance; reduces maximum probe length.
- Which hashing technique can reduce worst-case lookup time by bounding probe length?
A) Linear probing
B) Cuckoo hashing (with suitable load factor)
C) Chaining with long lists
D) Simple modulo hashing only
Answer: B
Solution: Cuckoo hashing typically offers O(1) worst-case lookup (bounded number of lookups) unless rehash needed.
- If we have two hash functions that are highly correlated, double hashing performance will:
A) Improve dramatically
B) Degrade due to secondary clustering/correlated probes
C) Remain unchanged
D) Become perfect
Answer: B
Solution: Correlated hashes produce similar probe sequences; reduces effectiveness.
- Universal hashing family often uses random coefficients modulo a prime. The collision probability for two distinct keys ≤:
A) 1/m
B) 1/p
C) 1
D) 0
Answer: A
Solution: For a family designed for table size m, collision probability ≤ 1/m.
- When resizing a hash table to keep α ≤ 0.75, if current m = 2000 and n = 1800, new m after doubling should be:
A) 4000
B) 2400 (so α=1800/2400=0.75)
C) 1800
D) 2000
Answer: B
Solution: To achieve α ≤ 0.75 with n=1800, choose m ≥ 1800/0.75 = 2400.
- Which of the following hashing schemes allows predictable deletions without tombstones and O(1) expected lookup?
A) Separate chaining
B) Open addressing linear probing
C) Cuckoo hashing (deletions simple: remove and done)
D) Robin Hood hashing (needs handling)
Answer: C
Solution: Cuckoo deletions simply remove key; open addressing needs tombstones.
- In perfect hashing layer-two scheme, secondary tables are built so that for bucket of size s, secondary table size ~ s^2 to ensure collision-free mapping. Why s^2?
A) To always fit keys
B) Expected collisions small; s^2 gives constant expected sum of squares and allows collision-free assignment with high probability
C) Arbitrary choice
D) For cryptographic properties
Answer: B
Solution: Using secondary table size ~ s^2 makes expected total space O(n).
- Which of these is true for chaining vs. open addressing at moderate load (α ≈ 0.5)?
A) Chaining uses less memory per element
B) Open addressing typically has better cache performance since data contiguous
C) Chaining is always faster
D) Open addressing requires pointers per element
Answer: B
Solution: Open addressing stores entries contiguously aiding cache; chaining needs pointers per node.
- If table size m is power of two, which hash method is discouraged?
A) h(k) = k mod m (division method) when keys have low-bit patterns
B) Multiplicative hashing method using golden ratio constant
C) Cryptographic hashing
D) Universal hashing
Answer: A
Solution: Mod 2^t uses only low bits which may not vary, causing clustering.
- A good choice for h2(k) in double hashing is often:
A) h2(k) = 0
B) h2(k) = 1 + (k mod (m−1)) when m is prime
C) h2(k) = h1(k)
D) h2(k) = constant 2
Answer: B
Solution: Ensures step size coprime to m and nonzero.
- In a hash table using chaining, if average chain length is 4 and maximum chain length observed is 40, this indicates:
A) Uniform distribution
B) Heavy clustering / non-uniformity or adversarial inputs
C) Perfect hashing
D) No keys present
Answer: B
Solution: Large max chain indicates skewed distribution or bad hash function.
- Which hashing strategy is particularly vulnerable to Denial-of-Service (DoS) attacks if an attacker can craft keys?
A) Chaining with poor hash function (predictable)
B) Cryptographic hashing
C) Double hashing with random seeds
D) Perfect hashing
Answer: A
Solution: If attacker crafts many keys that collide into same bucket, server spends O(n) time per query; mitigated by randomizing hash.
- What is the main idea behind order-preserving minimal perfect hashing?
A) Preserve the input order in the hash values while mapping uniquely
B) Make hash cryptographically secure
C) Use chaining to handle collisions
D) Reduce memory to zero
Answer: A
Solution: Order-preserving minimal perfect hash maps keys to 0..n−1 in the order of keys.
- Which hashing approach supports range queries easily?
A) Standard hash table (unordered)
B) B-tree or ordered map
C) Bloom filter
D) Cuckoo hash
Answer: B
Solution: Hash tables do not preserve order and are unsuitable for range queries; B-trees are.
- If you need to test set membership and are willing to tolerate false positives but want minimal memory, use:
A) Hash table with chaining
B) Bloom filter
C) Cuckoo hashing
D) AVL tree
Answer: B
Solution: Bloom filter offers compact probabilistic membership with false positives.
- Counting Bloom filter differs from Bloom filter by allowing:
A) Deletions via counters per bit (small integers)
B) No false positives
C) Perfect hashing
D) Constant-time rehashing
Answer: A
Solution: Counting Bloom uses small counters allowing increments/decrements to support deletions.
- Suppose you use modular hashing h(k)=k mod m. To evenly distribute keys that are multiples of 5, choose m:
A) Not a multiple of 5 (e.g., prime)
B) Multiple of 5
C) Equal to 5
D) 1
Answer: A
Solution: If m divisible by 5, keys that are multiples of 5 map to same small subset; choose m coprime to key structure.
- Which of these hashing-based structures gives approximate counts of elements (frequency) with sub-linear memory and bounded error?
A) Count-Min Sketch
B) Simple hash table
C) Bloom filter
D) Cuckoo table
Answer: A
Solution: Count-Min Sketch estimates frequencies with tunable error using multiple hash functions.
- A hash table with open addressing and linear probing has n = 800 inserted into m = 1000. For successful search expected probes ≈ (1/2)(1 + 1/(1−α)) where α = 0.8. Compute approx probes:
A) (1/2)(1 + 1/(1−0.8)) = (1/2)(1 + 1/0.2) = (1/2)(1 + 5) = (1/2)(6)=3
B) 0.8
C) 10
D) 1
Answer: A
Solution: Plug in α=0.8: expected ≈ 3 probes.
- Which of the following improves hash performance against adversarial inputs?
A) Fixed simple hash function
B) Randomized (salted) hashing per run/session
C) Using key mod small constant
D) No hash function at all
Answer: B
Solution: Randomized hashing (per-process salt) prevents attackers from predicting collisions.
- What is the expected number of empty buckets when n = m (one key per bucket on average) under random hashing?
A) m * e^{−1} ≈ 0.3679 m
B) 0
C) m
D) n
Answer: A
Solution: Probability a bucket empty = (1 − 1/m)^n ≈ e^{−n/m} = e^{−1} when n=m; expected empty buckets ≈ m e^{−1}.
- In open addressing, clustering length grows roughly as:
A) 1/(1−α)² for unsuccessful searches in linear probing
B) log n always
C) constant independent of α
D) n²
Answer: A
Solution: Linear probing unsuccessful probes expected ≈ (1 + 1/(1−α)²)/2 ~ O(1/(1−α)²).
- Which of these is a good strategy to build a hash table for keys that are strings of varying length?
A) Use polynomial rolling hash with modulus prime and then reduce modulo table size
B) Use only first character as key
C) Use length as hash
D) Use concatenation directly as index
Answer: A
Solution: Rolling polynomial hash mixes all characters; then take mod table size.
- Suppose we use multiplicative hashing h(k) = ⌊ m * frac(k*A) ⌋ with A = (√5 − 1)/2. This method is said to:
A) Depend less on table being prime and spread bits well
B) Be only for strings
C) Be insecure
D) Create collisions intentionally
Answer: A
Solution: Knuth’s multiplicative method with irrational A distributes keys reasonably across table sizes.
- In Cuckoo hashing with two tables of size m each and load factor α close to 0.5, expected lookup probes:
A) ≤ 2 (check two locations)
B) O(log n)
C) O(n)
D) 0
Answer: A
Solution: Lookup checks two positions (h1 and h2), giving up to two probes.
- Which structure supports membership queries with one-sided error (false positives allowed, false negatives impossible)?
A) Bloom filter
B) Hash table
C) Sorted array
D) Balanced BST
Answer: A
Solution: Bloom filters may report membership incorrectly (false positive) but never omit present items.
- If hash functions are not independent in a Bloom filter, false positive rate estimation:
A) May be invalid (analysis assumes independence)
B) Remains exact
C) Drops to zero
D) Always increases to 1
Answer: A
Solution: Independence assumption used in derivation; correlation alters actual rate.
- Which hashing scheme gives O(1) expected insertion but O(n) worst-case unless rehashing is used?
A) Chaining
B) Open addressing worst-case
C) Both A and B (worst-case pathological)
D) Perfect hashing always
Answer: C
Solution: Both can have pathological worst cases; expected amortized constant with randomness/resizing.
- What is the main drawback of separate chaining in terms of memory overhead?
A) Need pointers/objects for each entry (overhead per element)
B) Too few collisions
C) No deletions allowed
D) Only works for integers
Answer: A
Solution: Chaining needs node structures with pointers increasing memory per element.
- Which method avoids pointers for chaining while keeping expected O(1) operations and good cache locality?
A) Use contiguous arrays of entries and store next indices (array-based chaining)
B) Standard linked lists only
C) Trees as buckets
D) Random probing without arrays
Answer: A
Solution: Array-of-entries with indices reduces pointer overhead and improves locality.
- The birthday paradox says for m=365 possible birthdays, approx how many people until collision probability ≈ 0.5?
A) ≈ 23
B) ≈ 100
C) 365
D) 183
Answer: A
Solution: Classic result: about 23 people give ≈ 50% collision probability. (Analogy to hash collisions.)
- Given two independent hash functions h1,h2 for cuckoo hashing, if cycle occurs during insertion, usual remedy is:
A) Rehash with new hash functions or increase table size
B) Mark full and stop
C) Delete all keys
D) Use chaining temporarily
Answer: A
Solution: Rehashing breaks cycle by changing mapping.
- The major advantage of open addressing over chaining when memory is precious is:
A) No extra pointers—entries in contiguous array → lower memory overhead
B) Better handling of large keys
C) Simpler deletion semantics
D) Always faster lookups
Answer: A
Solution: Open addressing stores entries in array slots, no per-element pointers.
- In hash-consing technique (interning), which is stored in hash table?
A) Unique immutable objects to ensure sharing and fast equality testing by pointer
B) Mutable objects only
C) Random data only
D) Only integers
Answer: A
Solution: Hash-consing stores canonical objects to share instances.
- Which of the following metrics quantifies the “quality” of a hash distribution?
A) Maximum bucket size, variance of bucket sizes, and collision rate
B) Alphabetical order of keys
C) Number of bits in keys only
D) Table color
Answer: A
Solution: Variance and maximum bucket size measure uniformity.
- When using open addressing, cluster formation causes which of the following?
A) Longer probe sequences for new insertions and searches
B) Faster lookups always
C) No effect on performance
D) Decrease in memory usage
Answer: A
Solution: Clusters increase probe lengths and degrade performance.
- Which method reduces memory fragmentation and improves locality for chaining?
A) Use object pools or arena allocators with contiguous blocks for nodes
B) Use individual malloc for every node
C) Use new per insert always
D) Avoid freeing memory ever
Answer: A
Solution: Pool allocation places nodes close in memory for caching.
- If a hash function maps most keys to first quarter of table, what’s the major effect?
A) High clustering → long chains/long probes → performance drops
B) Perfect distribution
C) No collisions
D) Table becomes empty
Answer: A
Solution: Skewed distribution concentrates load.
- When designing a hash for 64-bit integers, which approach spreads high and low bits uniformly?
A) Multiply by large odd constant (64-bit) then shift (multiplicative hashing)
B) Use only lower 8 bits
C) Use identity h(k)=k
D) Use key length only
Answer: A
Solution: Multiplicative hashing mixes bits across word.
- A cryptographic hash like SHA-1 has collision resistance property meaning:
A) Hard to find two inputs with same digest
B) Guaranteed zero collision for any finite table
C) Constant-time computation always
D) Not deterministic
Answer: A
Solution: Collision resistance makes finding collisions computationally infeasible (not impossible).
- In Bloom filter, increasing k (hash functions) while keeping m,n fixed will:
A) Decrease false positive initially then increase after optimal k
B) Always increase false positives
C) Always decrease false positives
D) Not change false positives
Answer: A
Solution: There is an optimal k ≈ (m/n) ln2; beyond that false positive increases.
- Which of the following supports constant-time approximate set intersection (with error) for large data streams?
A) MinHash / locality-sensitive hashing sketches
B) Perfect hash tables
C) Chaining lists
D) BSTs
Answer: A
Solution: MinHash estimates Jaccard similarity using hash-based sketches.
- If table size m is chosen prime, division method h(k)=k mod m is improved because:
A) It avoids regular patterns when keys have common factors
B) It reduces time complexity
C) It increases collisions for primes only
D) It becomes cryptographic
Answer: A
Solution: Prime modulus reduces correlation with key structure.
- Which is a good strategy when keys are long strings and hashing is expensive?
A) Cache hash values for keys after first computation
B) Recompute hash each time without caching
C) Use only first character as hash
D) Avoid data structures altogether
Answer: A
Solution: Caching avoids repeated expensive recomputation.
- XOR-folding a 64-bit key into 32-bit hash by XORing high and low halves is:
A) Simple mixing method but might preserve patterns; better than truncation but not ideal
B) Perfect hash
C) Unsafe for integers only
D) Equivalent to cryptographic hash
Answer: A
Solution: XOR folding helps but can still leave patterns; multiplicative hashing often better.
- Which of the following is true about consistent hashing (used in distributed systems)?
A) Minimizes remapped keys when number of servers changes
B) Requires rehashing all keys on server change
C) Not useful for load balancing
D) Only works for integer keys
Answer: A
Solution: Consistent hashing maps keys to ring; adding/removing nodes affects only adjacent segments.
- In a hash table used as an LRU cache, what extra is needed per entry?
A) Linked list pointers to maintain ordering (for eviction)
B) Nothing
C) Sorting keys on access
D) Bloom filter per entry
Answer: A
Solution: Doubly-linked list plus hash map gives O(1) get/put and eviction.
- Given universal hashing family with random a,b in [0,p−1], h(k) = ((a*k + b) mod p) mod m. Why choose p > key universe and prime?
A) Ensures uniform arithmetic modulo p and reduces collisions analytically
B) Randomness replaced by p only
C) p must be composite
D) Not necessary
Answer: A
Solution: Prime modulus ensures arithmetic forms a field enabling uniform distribution proofs.
- Which hashing scheme allows amortized O(1) insertion but requires occasional O(n) rehashing?
A) Dynamic resizing (e.g., double table size when α≥threshold) with rehash
B) Fixed-size table never resized
C) Perfect hashing static only
D) Bloom filter
Answer: A
Solution: Resizing costs amortized over multiple inserts → amortized O(1).
- If Bloom filter uses k independent hash functions, to check membership we compute k hashes. For efficiency, we can derive k values using two hashes via:
A) Double hashing technique: h_i(x) = h1(x) + i*h2(x) mod m
B) Compute random numbers only once
C) Use only one hash always
D) Use length of key as second hash
Answer: A
Solution: Double hashing reduces cost to two hashes while approximating k independent hashes.
- What is the main benefit of hopscotch hashing?
A) Maintains small neighborhood so that lookup probes are short and cache-friendly
B) Avoids collisions entirely
C) Uses chaining instead of probing
D) Requires three hash functions always
Answer: A
Solution: Hopscotch keeps items within small probe distance for fast lookup.
- A hash table uses separate chaining; to find whether duplicates exist among n items, expected time is:
A) O(n) expected (insert and check)
B) O(n²) always
C) O(log n)
D) O(1)
Answer: A
Solution: Insert each and check bucket linear in chain length; expected total O(n + nα) = O(n).
- Which hashing-based sketch can be used to estimate distinct elements in a stream?
A) HyperLogLog / Flajolet–Martin (probabilistic counting)
B) Standard hash table only
C) Bloom filter only
D) AVL tree
Answer: A
Solution: HyperLogLog estimates cardinality using hashed leading zeros.
- If a hash table has poor distribution, which practical step helps debug?
A) Analyze bucket size distribution (histogram) to find skew
B) Reduce table size arbitrarily
C) Remove random keys
D) Use only first letter of key
Answer: A
Solution: Histogram reveals skew due to poor hash; then choose better hash or rehash.
- Suppose collision chain of length 100 occurs in chaining due to adversarial inputs. Effective mitigation:
A) Switch to randomized hash or resize table/double hashing or use tree-based buckets (like Java 8 uses treeify)
B) Continue as-is
C) Use linear probing instead
D) Decrease table size
Answer: A
Solution: Randomize hash or convert long list to balanced tree reduces worst-case to O(log n).
- Which approach stores multiple keys per slot to reduce collisions and improve cache behavior?
A) Bucketization with small arrays per bucket (open-addressing variant)
B) Standard chaining only
C) Single-key slot mandatory
D) Bloom filter per slot
Answer: A
Solution: Storing small buckets (e.g., 4 entries) is cache-friendly and reduces probe length.
- In a hash table for integers with multiplicative hashing constant A, what is risk if A chosen poorly (like small rational)?
A) Poor mixing — cluster on certain bits → collisions
B) Faster mixing always
C) No effect
D) Table becomes cryptographic
Answer: A
Solution: Multiplicative constant should be irrational-ish (e.g., fractional part of golden ratio) to mix bits.
- Which of the following enables lock-free concurrent hash tables?
A) Atomic compare-and-swap (CAS) operations and careful design (e.g., split-ordered lists)
B) Big global lock only
C) No synchronization at all
D) Only allow reads
Answer: A
Solution: CAS-based techniques and careful memory reclamation allow lock-free concurrent tables.
- If we need to detect membership with deletions and no false positives, which is best?
A) Exact hash table (e.g., open addressing or chaining)
B) Bloom filter (no — has false positives)
C) Count-Min (approximate)
D) None
Answer: A
Solution: Exact structures maintain precise membership and allow deletions.
- Which property is critical for string hashing to avoid collision on common patterns?
A) Use polynomial rolling with large base and modulus or multiple moduli to reduce collisions
B) Use only first char
C) Use length only
D) Use concatenation as integer directly
Answer: A
Solution: Polynomial rolling mixes characters; modulus and base choice reduce collisions.
- A hash table is used as dictionary with frequent inserts and deletes. Which approach is simplest to keep performance stable?
A) Use chaining and resize when necessary
B) Use open addressing without tombstones or resizing
C) Use static perfect hashing
D) Use only arrays without hashing
Answer: A
Solution: Chaining handles deletions simply and resizing keeps α bounded.
- Which of the following reduces memory overhead for chaining in languages without pointer-sized overhead (e.g., packed arrays)?
A) Use contiguous arrays of entries and next indices (index-based linked lists)
B) Use new pointer per node always
C) Use Python dict only
D) Use trees only
Answer: A
Solution: Index-based chaining reduces per-node overhead and is compact.
- If you want ordered iteration with hashing-like average lookup, which data structure fits?
A) Hash table + doubly-linked list (ordered dict / LinkedHashMap)
B) Plain hash table only
C) Bloom filter
D) Heap
Answer: A
Solution: Maintain insertion order via linked list along with hash table for O(1) lookup.
- In practice, which factor often dominates hashing performance on modern CPUs?
A) Memory latency and cache misses, not pure CPU cycles for hash function
B) Hash function math only
C) Table size only
D) Key alphabet only
Answer: A
Solution: Cache misses and memory access patterns are often the bottleneck.
- Which of these is an advantage of robin-hood hashing over linear probing?
A) Lower variance of probe lengths and shorter maximum probe length → more predictable performance
B) Always fewer probes than any method
C) No collisions ever
D) Requires more memory than chaining always
Answer: A
Solution: Robin Hood balances probe distances.
- Which probabilistic data structure answers “Is x definitely not in set?” quickly and compactly?
A) Bloom filter (if it says No → definitely not; if Yes → maybe)
B) Hash table only
C) AVL tree
D) Heap
Answer: A
Solution: Bloom filter gives definite negatives quickly.
- When using multiplicative hashing with word size w and table size m = 2^r, the hash is h(k) = (kA mod 2^w) >> (w − r). Why shift?
A) To extract high bits of product which are well-mixed
B) To throw away meaningful info
C) To slow down hash
D) For encryption
Answer: A
Solution: High bits of kA are mixed and provide good distribution for power-of-two table sizes.
- Which hashing approach is best when keys are sparse integers in a large universe and memory is tight?
A) Use a sparse hash map / dynamic perfect hashing / compressed tries depending on need
B) Use huge dense array indexed by key
C) Always use Bloom filter only
D) Use stack
Answer: A
Solution: Sparse specialized maps (e.g., sparse_hash_map) or perfect hashing can be memory efficient.
- If hash function is cryptographic and collision resistant, for typical hash table workloads that need speed, trade-off is:
A) slower inserts/lookup but strong collision resistance (overkill for non-adversarial)
B) faster overall always
C) impossible to implement
D) collisions become zero
Answer: A
Solution: Cryptographic hashes cost more CPU; choose faster non-crypto hashes for in-memory tables unless adversary present.
- Which is true about two-level perfect hashing (Fredman–Komlós–Szemerédi scheme) for static sets of size n?
A) Expected space O(n) and O(1) worst-case lookup after construction
B) Always uses O(n^2) space
C) Lookups cost O(log n) worst-case
D) Not possible in practice
Answer: A
Solution: FKS two-level scheme yields linear expected space and constant worst-case lookup for static sets.