Week 04: Lists and Hash Tables (Lecture)

Michael T. Gastner

2026-01-26

Intro

Learning Objectives

By the end of this lecture, you should be able to:

Define key concepts: linked list, set, dictionary, direct-address table, and hash table.
Explain search, insert, and delete operations for doubly-linked lists and hash tables.
State the running times of these operations in \(\Theta\)-notation.
Determine the state of a hash table given a set of keys and a specific hash function.
Understand the causes of collisions in hash tables and their implications.
Compare and contrast collision resolution techniques: chaining and open addressing.

Motivation

In the random-access machine model, arrays provide constant-time access to elements at any given index. However, checking whether a specific value exists in an unsorted array of size \(n\) requires scanning the entire array. This process has a worst-case running time of \(\Theta(n)\), which occurs when the value is absent.

Additionally, inserting and deleting elements in arrays is inefficient if you want to maintain as much of the previous sequence as possible. For instance, inserting an element at the beginning of an array requires shifting all subsequent elements one position to the right, resulting in a worst-case running time of \(\Theta(n)\).

In this lecture, we will explore linked lists and hash tables, which potentially accelerate these operations.

Linked Lists

Definition

A singly linked list is a data structure where each element contains both a key and a pointer to the next element in the list. The end of the list is marked by a special pointer, nil. Additionally, the list possesses a \(\mathit{head}\) attribute, which points to the first element in the list.

A doubly linked list extends this structure by adding either a pointer to the previous element or nil for the first element.

Searching a Linked List

To search a linked list, start at the head and traverse the list by following the pointers from one element to the next. The traversal continues until either the desired key is found or the end of the list is reached. If the key is found, a pointer to the key is returned; otherwise, the procedure returns nil.

\begin{algorithm}
\begin{algorithmic}
\Procedure{List-Search}{$L$, $k$}
\State $x = L.\mathit{head}$
\While{$x \neq$ \textsc{nil} and $x.\mathit{key} \neq k$}
  \State $x = x.\mathit{next}$
\EndWhile
\Return $x$
\EndProcedure
\end{algorithmic}
\end{algorithm}

In the example below, Search\((L, 10)\) returns 3E.

Inserting into a Linked List

When inserting a new key into a linked list, two cases must be distinguished:

List-Prepend\((L, x)\):

The new key, pointed to by \(x\), is inserted at the beginning of the list.
List-Insert\((x, y)\):

The new key is to be inserted after an existing key. Here, \(x\) is assumed to be a pointer to the new key and \(y\) a pointer to the existing key. The list \(L\) is not a parameter of List-Insert because only the existing list element \(y\) is required as input, not the entire list.

The procedures for both cases are detailed on the following slides, where the list is assumed to be doubly-linked.

Case 1: Prepending an Element to a Linked List

\begin{algorithm}
\begin{algorithmic}
\Procedure{List-Prepend}{$L$, $x$}
\State $x.\mathit{next} = L.\mathit{head}$
\State $x.\mathit{prev} = $ \textsc{nil}
\If{$L.\mathit{head} \neq$ \textsc{nil}}
  \State $L.\mathit{head}.\mathit{prev} = x$
\EndIf
\State $L.\mathit{head} = x$
\EndProcedure
\end{algorithmic}
\end{algorithm}

The example below presents the result of calling List-Prepend\((L, x)\), where \(x\) points to the address 4D and \(x.\mathit{key} = 15\):

Case 2: Inserting an Element After an Existing One

\begin{algorithm}
\begin{algorithmic}
\Procedure{List-Insert}{$x$, $y$}
\State $x.\mathit{next} = y.\mathit{next}$
\State $x.\mathit{prev} = y$
\If{$y.\mathit{next} \neq$ \textsc{nil}}
  \State $y.\mathit{next}.\mathit{prev} = x$
\EndIf
\State $y.\mathit{next} = x$
\EndProcedure
\end{algorithmic}
\end{algorithm}

The example below illustrates the result of calling List-Insert\((x, y)\), where \(x\) points to the address 1B, \(x.\mathit{key} = 9\), and \(y\) is the element at the address 3E:

Deleting from a Doubly-Linked List

The following procedure removes the element pointed to by \(x\) from the list \(L\):

\begin{algorithm}
\begin{algorithmic}
\Procedure{List-Delete}{$L$, $x$}
\If{$x.\mathit{prev} \neq$ \textsc{nil}}
  \State $x.\mathit{prev}.\mathit{next} = x.\mathit{next}$
\Else
  \State $L.\mathit{head} = x.\mathit{next}$
\EndIf
\If{$x.\mathit{next} \neq$ \textsc{nil}}
  \State $x.\mathit{next}.\mathit{prev} = x.\mathit{prev}$
\EndIf
\EndProcedure
\end{algorithmic}
\end{algorithm}

The example below demonstrates the result of calling List-Delete\((L, x)\), where \(x\) points to the address 2A:

Running Times of Doubly-Linked List Operations

Operation	Worst-Case Running Time	Reason
List-Search	\(\Theta(n)\)	Must examine all elements if key is not in the list.
List-Prepend List-Insert List-Delete	\(\Theta(1)\)	Involves only pointer updates. Note that the pointer to the element to be inserted or deleted must be provided as an argument. If only the key is known, a \(\Theta(n)\) search must be performed first.

Compared to an unsorted array, a doubly-linked list does not reduce the asymptotic growth rate of the search operation in the worst case. However, insertions and deletions are faster than the \(\Theta(n)\) time required for an array.

Sets and Dictionaries

Sets

Both arrays and linked lists can be used to implement sets:

Definition

A set is an unordered collection of unique elements.

For instance, the arrays and linked lists shown below represent the same set, \(\{0, 6, 7, 10\}\). The order of the elements is arbitrary and does not affect the representation of the set.

Dictionaries

Definition

A dictionary is a data structure that stores a set of elements, referred to as keys, and supports the following operations:

Search: Check whether a given key exists in the set.
Insert: Add a new key to the set.
Delete: Given a pointer to the storage location of a key, remove it from the set.

Because keys constitute a set, they must be unique.

Satellite Data

Definition

Satellite data are objects associated with keys in a dictionary. These associations remain unchanged during any dictionary operation.

Examples:

A phone book is a dictionary where the keys are names, and the satellite data are phone numbers.
Health records can be stored in a dictionary where the keys are numeric patient identifiers and the satellite data include names, birthdays, and medical histories.

Searching Unsorted Arrays Is Slow

In principle, an unsorted array can be used as a dictionary:

Search: Scan the array for the key. This operation requires \(\Theta(n)\) time in the worst case, where \(n\) is the array size, because the key may not be in the array.
Insert: Append the new key to the end of the array. This operation takes \(\Theta(1)\) time, assuming there is sufficient space at the end of the array.
Delete: If the array index of the key is provided, remove the key by swapping it with the last element and decrementing the array size. This operation also needs \(\Theta(1)\) time.

The \(\Theta(n)\) worst-case running time of the search operation renders unsorted arrays impractical for large dictionaries.

Sorted Arrays Are Also Inefficient Dictionaries

By storing keys in a sorted array and applying binary search, the search operation can be improved to \(\Theta(\log⁡ n)\) time.

However, insertion and deletion are slower than for unsorted arrays—\(\Theta(n)\) in the worst case—because the keys must be shifted to maintain the sorted order. For example, if the key to be added or deleted is the minimum in the set, \(\Theta(n)\) shifts are required.

Linked Lists Are Slow for Dictionary Searches

Linked lists exhibit the same asymptotic growth rates for worst-case running times as unsorted arrays:

Search: \(\Theta(n)\)
Insert: \(\Theta(1)\)
Delete: \(\Theta(1)\)

Sorting the linked list does not improve the search time because the list must still be traversed sequentially from the head to locate the key.

Average Running Times

Assume the keys to be searched or inserted into a dictionary are drawn from identical probability distributions, with each key equally likely to be deleted. Under these assumptions, the average running times of dictionary operations exhibit the same growth rates as their worst-case counterparts:

Operation	Unsorted Array	Sorted Array	Linked List
Search	\(\Theta(n)\)	\(\Theta(\log n)\)	\(\Theta(n)\)
Insert	\(\Theta(1)\)	\(\Theta(n)\)	\(\Theta(1)\)
Delete	\(\Theta(1)\)	\(\Theta(n)\)	\(\Theta(1)\)

In summary, arrays and linked lists are suboptimal for implementing dictionaries because at least one of the three dictionary operations—search, insert, or delete—is slow, with running times growing linearly with the dictionary size.

On the following page, we introduce direct-address tables, which allow all dictionary operations to run in \(O(1)\) time in the worst case.

Direct-Address Tables

Definition

A direct-address table \(T\) is an array-based data structure that stores a set of keys, each of which is an integer in the range from \(0\) to \(m - 1\), where \(m\) is the table size. The set of all possible keys \(\{0, 1, \ldots, m-1\}\) is called the universe.

For each key \(k\) in the universe, the table contains a slot \(T[k]\) that can store the key and any associated satellite data. If \(k\) is not in the set to be stored, \(T[k]\) is assigned the value nil.

The illustration on the following page depicts a direct-address table with a universe of size \(m = 10\) and keys in \(\{1, 4, 5, 9\}\).

Illustration of Direct Addressing

Direct-Address Dictionary Operations

Each of the three dictionary operations can be implemented in \(O(1)\) time using direct-address tables:

\begin{algorithm}
\begin{algorithmic}
\Procedure{Direct-Address-Search}{$T$, $k$}
\Return $T[k]$
\EndProcedure
\Procedure{Direct-Address-Insert}{$T$, $x$}
\State $T[x.\mathit{key}] = x$
\EndProcedure
\Procedure{Direct-Address-Delete}{$T$, $x$}
\State $T[x.\mathit{key}] = $ \textsc{nil}
\EndProcedure
\end{algorithmic}
\end{algorithm}

Disadvantages of Direct-Address Tables

Although fast, direct-address tables have significant drawbacks:

If the universe contains a large number of keys, it may be infeasible to store the entire table in memory.
Typically, the set of keys \(K\) that are actually stored is only a small subset of the universe. In such cases, a direct-address table wastes memory.

To address these limitations, we will now introduce hash tables, which generally reduce memory requirements to \(\Theta(|K|)\) at the expense of an \(O(|K|)\) worst-case running time for dictionary operations.

However, the average running time for hash tables is \(O(1)\) under realistic assumptions. Moreover, with well-designed hash functions, it is highly unlikely to experience the worst-case scenario.

Hash Tables

Definition

A hash table \(T\) is an array-based data structure that stores a set of keys from a universe \(U\) by mapping them to array indices using a hash function \(h: U \to \{0, 1, \ldots, m-1\}\), where \(m\) is the array size. Specifically, the key \(k\) is stored in the slot \(T[h(k)]\).

We say that “the key \(k\) hashes to slot \(h(k)\)” and that “\(h(k)\) is the hash value of key \(k\).”

If two distinct keys \(k_i\) and \(k_j\) hash to the same slot, we encounter a collision.

We will assume that \(h\) can be computed in \(O(1)\) time. A direct-address table is a special case of a hash table where \(h\) is the identity function.

Illustration of Hashing

The hash function \(h\) maps the keys to the slots in the hash table. In the plot below, \(k_1\) and \(k_2\) hash to the same slot, causing a collision.

Independent and Uniform Hashing

Definition

A hash function \(h\) is said to be independent and uniform if it satisfies the following properties:

For any two distinct keys \(k_i\) and \(k_j\), the hash values \(h(k_i)\) and \(h(k_j)\) are independent.
For any key \(k\), the probability that \(h(k) = i\) is \(1/m\) for each \(i\) in the range of array indices, \(\{0, 1, \ldots, m-1\}\).

Uniform hashing is an idealized model that helps us analyze the expected behavior and key properties of hash tables, such as collision frequency and average running times.

Hash Functions Used in Practice

In practice, hash functions are not independent and uniform. Instead, they are designed to be computationally efficient and deterministic. However, with careful hash-function design, we can approximate the ideal behavior in practice.

Here are common techniques to generate hash functions:

Division method
Multiplication method
Universal hashing

These techniques will be discussed in the following slides.

Division Method

Assume that every key is a nonnegative integer. If necessary, a surrogate key can be created by mapping each input key to a unique nonnegative integer (e.g., \(\text{A} \to 0\), \(\text{B} \to 1\), etc.).

The division method generates hash values by applying simple arithmetic to the nonnegative integer key \(k\):

\[\begin{equation*} h(k) = k \bmod m, \end{equation*}\]

where:

\(m\) is the number of slots in the hash table.
\(k\) is the key.
\(\bmod\) represents the modulo operation, which returns the remainder when \(k\) is divided by \(m\).

How To Choose the Divisor

To help the division method spread keys more evenly, choose \(m\) to be a prime number. Using a prime breaks many simple patterns in the keys (for example, when lots of keys share the same last digit or are multiples of a fixed number), which otherwise can cause many keys to land in the same slots. Also avoid choosing \(m\) close to a power of 2 (2, 4, 8, 16, …), since patterns in the low-order bits can then show up directly in \(k \bmod m\).

Example

Using the division method with \(m=11\), the hash values for the keys 56, 29, 90, 40, 82, 30, and 4 are computed as follows:

\(h(56) = 1\), \(h(29) = 7\), \(h(90) = 2\), \(h(40) = 7\), \(h(82) = 5\), \(h(30) = 8\), and \(h(4) = 4\).

Here, the keys 29 and 40 collide because they hash to the same slot: \(h(29)=h(40)=7\).

Quiz

What is the hash value of the key 100 when using the division method with the divisor \(m = 13\)?

Multiplication Method

The multiplication method applies the hash function

\[\begin{equation*} h(k) = \lfloor m (k A \bmod 1) \rfloor, \end{equation*}\]

where:

\(m\) is the number of slots. An integer power of 2 (e.g., \(m = 2^{14} = 16,384\)) is recommended so that the multiplication with \(m\) can be implemented efficiently as a bit shift.
\(k\) is the key.
\(A\) is a constant in the open interval \((0, 1)\). A commonly recommended value is the inverse of the golden ratio: \(A = \frac{\sqrt{5} - 1}{2}\).
The \(\bmod\;1\) operation returns the fractional part of its operand; that is, \(x \bmod 1 = x - \lfloor x \rfloor\).

Example of the Multiplication Method

Consider a hash table of size \(m = 1000\) and a corresponding hash function \(h(k)= \lfloor m (kA \bmod 1) \rfloor\) for \(A = (\sqrt{5} - 1) / 2\). Compute the locations to which the keys 61, 62, 63, 64, and 65 are mapped.

Solution

The hash values are 700, 318, 936, 554 and 172, respectively.

Because \(m\) is not an integer power of 2, we cannot use bit shifting to compute the multiplication. Therefore, this hash function is not efficient.

Moreover, all hash values are even numbers, indicating that this hash function does not achieve uniform hashing. Thus, it is unsuitable for practical use.

Universal Hashing

Definition

A hashing algorithm is called universal if it satisfies the following two conditions:

The hash function is selected at random from a family of hash functions.
For any two distinct keys \(k_i\) and \(k_j\), the probability that \(h(k_i)=h(k_j)\) is at most \(1/m\).

Universal hashing mitigates worst-case scenarios where many keys hash to the same slot during each execution of the program.

For examples, refer to Section 11.3.4 in Cormen et al. (2022).

Collision Resolution

Even the best hash functions cannot completely eliminate collisions between keys. There are two common approaches to resolving collisions in hash tables:

Chaining: Each slot in the hash table contains a pointer to a linked list of all keys that hash to that slot.
Open addressing: When a collision occurs, the algorithm searches for an alternative slot in the hash table to store the key. The search is guided by a probe sequence that determines the order in which slots are examined.

Chaining

In the example depicted below, the keys \(k_4\) and \(k_5\) collide. Thus, they are stored together with their values in a linked list:

Load Factor

The following definition is useful for expressing the average running time of dictionary operations when using chaining:

Definition

The load factor \(\alpha\) is the ratio of the number of keys \(n\) stored in the hash table to the number of slots \(m\) in the table:

\[\begin{equation*} \alpha = \frac{n}{m} \end{equation*}\]

The load factor \(\alpha\) can be interpreted as the average length of a linked list associated with a randomly chosen slot.

Dictionary Operations When Using Chaining

\begin{algorithm}
\begin{algorithmic}
\Procedure{Chained-Hash-Search}{$T$, $k$}
\Return \textsc{List-Search}($T[h(k)]$, $k$)
\EndProcedure
\Procedure{Chained-Hash-Insert}{$T$, $x$}
\State \textsc{List-Prepend}($T[h(x.\mathit{key})]$, $x$)
\EndProcedure
\Procedure{Chained-Hash-Delete}{$T$, $x$}
\State \textsc{List-Delete}($T[h(x.\mathit{key})]$, $x$)
\EndProcedure
\end{algorithmic}
\end{algorithm}

Operation	Average Running Time	Comments
Chained-Hash-Search	\(O(1 + \alpha)\)	Derived in Section 11.2 of Cormen et al. (2022).
Chained-Hash-Insert	\(O(1)\)
Chained-Hash-Delete	\(O(1)\)	Assuming the list is doubly linked.

Open Addressing

While chaining resolves collisions by storing linked lists outside the hash table, open addressing stores all keys directly in the slots of the hash table. Each slot contains either a key or nil.

Unlike chaining, open addressing allows at most one key per slot. Consequently, the load factor \(\alpha\) can never exceed 1. If a user attempts to insert a key into a full hash table, an error message must be displayed.

When searching for a key, the algorithm systematically examines table slots until it either finds the desired key or determines that the key is not in the table.

Probing

To perform insertion using open addressing, we probe the hash table until an empty slot is found for the key. The sequence of probes depends on the key being inserted.

To determine which slots to probe, the hash function is extended to include the probe number as a second input:

\[\begin{equation*} h: U \times \{0, 1, \ldots, m - 1\} \to \{0, 1, \ldots, m - 1\}. \end{equation*}\]

The probe sequence \(\langle h(k, 0), h(k, 1), \ldots, h(k, m - 1) \rangle\) must be a permutation of \(\langle 0, 1, \ldots, m - 1 \rangle\), ensuring that every hash-table position is eventually considered as a slot for a new key as the table fills up.

Inserting a Key If There Have Been No Deletions

For simplicity, assume that no keys have been deleted from the hash table so far. The Hash-Insert-Without-Deleted procedure takes as input a hash table \(T\) and a key \(k\).

The procedure either returns the slot number where \(k\) is stored or flags an error if the hash table is already full:

\begin{algorithm}
\begin{algorithmic}
\Procedure{Hash-Insert-Without-Deleted}{$T$, $k$}
\State $i = 0$
\Repeat
  \State $q = h(k, i)$
  \If{$T[q] \texttt{==}$ \textsc{nil}}
    \State $T[q] = k$
    \Return $q$
  \Else
    \State $i = i + 1$
  \EndIf
\Until{$i$ \texttt{==} $m$}
\State \textbf{error} ``hash table overflow''
\EndProcedure
\end{algorithmic}
\end{algorithm}

Open-Addressing Search for a Key

The algorithm for searching for a key \(k\) probes the same sequence of slots that was examined when key \(k\) was inserted.

\begin{algorithm}
\begin{algorithmic}
\Procedure{Hash-Search}{$T$, $k$}
\State $i = 0$
\Repeat
  \State $q = h(k, i)$
  \If{$T[q] \texttt{==} k$}
    \Return $q$
  \EndIf
  \State $i = i + 1$
\Until{$T[q] \texttt{==} $\textsc{nil} or $i \texttt{==} m$}
\Return \textsc{nil}
\EndProcedure
\end{algorithmic}
\end{algorithm}

Deleting a Key

Deletion from an open-address hash table is challenging. When a key is deleted from slot \(q\), we cannot simply mark that slot as empty by storing nil in it. Doing so might prevent the retrieval of any key whose insertion involved probing slot \(q\) and finding it occupied.

We can solve this problem by marking the slot as deleted instead of nil.

Illustrating the Deletion of a Key

In the figure below, the hash function is assumed to be \(h(k, i) = (k + i) \bmod 5\).

If slot 2 is marked as nil after deleting 32, Hash-Search\((T, 76)\) would return nil, incorrectly indicating that key 76 is not in the hash table.

However, if slot 2 is marked as deleted, Hash-Search\((T, 76)\) finds key 76 in slot 3.

Inserting a Key If There Has Been a Deletion

If keys have been deleted from the hash table, the insert procedure must also check for slots marked deleted. Such slots can be reused for new keys:

\begin{algorithm}
\begin{algorithmic}
\Procedure{Hash-Insert}{$T$, $k$}
\State $i = 0$
\Repeat
  \State $q = h(k, i)$
  \If{$T[q] \texttt{==}$ \textsc{nil} or $T[q] \texttt{==}$ \textsc{deleted}}
    \State $T[q] = k$
    \Return $q$
  \Else
    \State $i = i + 1$
  \EndIf
\Until{$i$ \texttt{==} $m$}
\State \textbf{error} ``hash table overflow''
\EndProcedure
\end{algorithmic}
\end{algorithm}

The only difference from Hash-Insert-Without-Deleted is the additional check for deleted in line 5.

Uniform Hashing

Hash functions used for open addressing should ideally perform uniform hashing, defined by two criteria:

For every fixed probe number \(i\), the hash function \(h(k, i)\) should perform simple uniform hashing, meaning that the hash value \(h(k, i)\) of any random key \(k\):
- is equally likely to hash into any slot \(0, 1, \ldots, m-1\).
- is independent of the hash value of any other key.
The probe sequence \(\langle h(k, 0), h(k, 1), \ldots, h(k, m - 1) \rangle\) is:
- equally likely to be any of the \(m!\) permutations of \(\langle 0, 1, \ldots, m - 1\rangle\).
- independent of any probe sequence \(\langle h(k', 0), h(k', 1), \ldots, h(k', m - 1) \rangle\) with \(k \neq k'\).

Violating either criterion can cause clustering of keys in certain parts of the hash table, which in turn can degrade the performance of the hash table.

Uniform Hashing Is Difficult to Implement

The implementation of true uniform hashing is challenging. Most hash functions that are used in practice do not generate all of the \(m!\) possible permutations.

In this lesson, we discuss two methods that guarantee that the probe sequence \(\langle h(k, 0), h(k, 1), \ldots, h(k, m - 1) \rangle\) is a permutation of \(\langle 0, 1, \ldots, m - 1\rangle\) for each key \(k\):

Linear probing, which generates at most \(m\) distinct probe sequences
Double hashing, which generates at most \(m^2\) distinct probe sequences

Linear Probing

Given a hash function \(h': U \to \{0, 1, \ldots, m - 1\}\), which we refer to as an auxiliary hash function, the method of linear probing uses the hash function

\[\begin{equation*} h(k, i) = (h'(k) + i) \bmod m \end{equation*}\]

for \(i = 0, 1, \ldots, m - 1\).

In the example on the right, keys were inserted in the sequence \(\langle 42, 53, 14, 92, 27, 67 \rangle\), \(h'(k) = k\) and \(m = 13\). The result exhibits clustering, whereby long sequences of occupied slots are created, leading to an increase in the average search time.

Double Hashing

Double hashing uses a hash function of the form

\[\begin{equation*} h(k, i) = (h_1(k) + i h_2(k)) \bmod m, \end{equation*}\]

where both \(h_1\) and \(h_2\) are auxiliary hash functions.

The value of \(h_2(k)\) must be relatively prime to \(m\) so that the entire hash table can be searched. This property can be established, for example, in either of the following two ways:

Set \(m\) equal to a power of 2, and ensure that \(h_2\) always produces an odd number.
Choose a prime number as \(m\) and \[\begin{align*} & h_1(k) = k,\\ & h_2(k) = 1 + (k \bmod m'), \end{align*}\] where \(m'\) is slightly less than \(m\), such as \(m' = m-1\).

Double Hashing

In the example on the right,

\[\begin{equation*} h(k, i) = (h_1(k) + i h_2(k)) \bmod m, \end{equation*}\]

with \(h_1(k) = k\) and \(h_2(k) = 1 + (k \bmod 12)\). As before, keys were inserted in the sequence \(\langle 42, 53, 14, 92, 27, 67 \rangle\). Note that we never had to probe any sequence beyond \(i = 1\), which provides evidence that double hashing is less prone to clustering than linear probing.

Although double hashing produces only \(m^2\) out of the \(m!\) possible probe sequences, its performance is practically as good as the ideal scheme of uniform hashing.

Running Time of Searching

One can show that the average time needed for searching an open-address hash table depends on the load factor \(\alpha = n / m\) as follows:

\(O\left(\frac{1}{1 - \alpha}\right)\) if the search is unsuccessful.
\(O\left(\frac{1}{\alpha} \log \frac{1}{1 - \alpha}\right)\) if the search is successful or a key is inserted,

Note that \(n\) includes the number of deleted keys because they still occupy slots in the hash table.

Interpretation of Open Addressing’s Asymptotic Running Time

As the hash fills up (i.e. \(\alpha\) approaches 1 from below), both of the upper bounds \(O\left(\frac{1}{1 - \alpha}\right)\) and \(O\left(\frac{1}{\alpha} \log \frac{1}{1 - \alpha}\right)\) diverge. When \(\alpha\) exceeds 1, open-addressing cannot be used for resolving collisions because there is insufficient space in the hash table.

However, if we can guarantee that \(\alpha\) never exceeds a constant \(< 1\) (for example, by occasionally resizing the hash table when it becomes too full), then searching only needs \(O(1)\) time.

Conclusion

Summary of Key Learning Outcomes

We defined these data structures:
1. Linked lists ↪
2. Sets ↪
3. Dictionaries ↪
4. Direct-address tables ↪
5. Hash tables ↪
We explained the following operations:

Search Insert Delete

Linked list ↪ ↪ ↪

Hash table (chaining) ↪ ↪ ↪

Hash table (open addressing) ↪ ↪ ↪

	Search	Insert	Delete
Linked list	↪	↪	↪
Hash table (chaining)	↪	↪	↪
Hash table (open addressing)	↪	↪	↪

We stated the running times of these operations. ↪
We determined the state of a hash table given a set of keys and a specific hash function. ↪
We understood the causes of collisions in hash tables and their implications. ↪
We compared collision resolution techniques:
1. Chaining ↪
2. Open addressing ↪

Outlook

While hash tables are efficient data structures for inserting, searching, and deleting keys, they are not suitable for all applications. For instance, they are not well-suited for finding the smallest or largest key in a set.

Next week, we will discuss binary search trees, which are data structures that enable fast retrieval of minimal and maximal keys while still allowing efficient search, insertion, and deletion of keys.

Bibliography

Cormen, T.H. et al. (2022) Introduction to algorithms. 4th ed. MIT Press.