May 13, 2026

Beyond Ports, Adapters, and Simulators: P/A/S with “Pay-It-Forward” Extensions (or: The Virtuous Cycle of Provider-Owned Simulators)

1. Ports, Adapters, and Simulators

In 2005, Alistair Cockburn introduced the Hexagonal Architecture, later renamed to Ports and Adapters. The central idea is to isolate application logic from the outside world by defining explicit boundaries, ports, through which all interaction flows. Infrastructure concerns (databases, file systems, HTTP services) connect to those ports through adapters, and the application core never references infrastructure directly.

“Allow an application to equally be driven by users, programs, automated test or batch scripts, and to be developed and tested in isolation from its eventual run-time devices and databases.” – Alistair Cockburn, Hexagonal Architecture (2005) https://alistair.cockburn.us/hexagonal-architecture/

The pattern gained widespread adoption in the testing community through Steve Freeman and Nat Pryce’s Growing Object-Oriented Software, Guided by Tests (Addison-Wesley, 2009), which introduced a third element: the simulator. A simulator implements the same port interface as the real adapter but uses an in-memory data structure instead of real infrastructure. Tests run against the simulator; production runs against the adapter. The application core cannot tell the difference.

The three elements together (Port, Adapter, Simulator) give developers a clean architecture and a fast, isolated test suite. A port is an interface. An adapter is a real implementation of that interface. A simulator is a lightweight stand-in used in tests.

A note on terminology. The testing literature uses related terms with varying precision. Test double (Meszaros, xUnit Test Patterns, 2007) is the umbrella category for any object that stands in for a real dependency in a test. Within that category, a stub returns canned answers, a mock verifies call expectations, and a fake is a working simplified implementation that takes shortcuts inappropriate for production. This paper uses simulator specifically to mean a provider-maintained fake validated against the same interface contract as the production adapter. That distinction is the subject of what follows.

This paper accepts all of that. It then asks a question the original formulation leaves unanswered.

Relationship to existing practice

A reader familiar with the testing ecosystem will recognize pieces of what this paper proposes. Provider-maintained in-memory implementations already exist: SQLite ships with an in-memory mode, LocalStack emulates AWS services locally, embedded Kafka and WireMock serve as test doubles for their respective protocols, and many database drivers include in-memory variants. Contract test suites and Technology Compatibility Kits (TCKs) already exist to validate behavioral compatibility across implementations. These are real, valued practices.

This paper does not claim to invent them.

The contribution here is a different claim: that in a Ports and Adapters architecture, simulator ownership follows naturally from contract ownership, and therefore belongs structurally to the provider rather than incidentally to the consumer.

That distinction matters. TCK ecosystems assume many independent third-party implementations and place the provider in the role of validating outsiders. The model here assumes the provider themselves ships the simulator as part of the package — the same package that ships the port and the adapter. The consumer does not certify compatibility; they simply use what the provider already certified.

The existing ecosystem mostly treats provider-maintained simulators as convenient extras: optional ergonomics, ecosystem kindness, testing utilities. This paper treats them as a structural consequence of contract ownership — something that follows from the architecture rather than something layered on top of it.

Much of the influential literature in software engineering works this way. Domain-Driven Design did not invent layered systems or domain modeling. Clean Architecture did not invent dependency inversion. What those works contributed was formalization, naming, and the conversion of scattered practice into explicit doctrine. The goal here is the same: to take something practitioners do in places and argue that they should do it everywhere, for a reason that is architectural rather than merely pragmatic.


2. Building a To-Do Application with P/A/S

To make the discussion concrete, consider a simple key-value database offered as a package. Its port defines the operations a consumer can perform: create and delete tables, and set, get, delete, and list keys within a table.

// kv-database/src/port.ts
export interface IKeyValueStore {
  createTable(name: string): void;
  deleteTable(name: string): void;
  listTables(): string[];

  set(table: string, key: string, value: string): void;
  get(table: string, key: string): string | null;
  delete(table: string, key: string): void;
  listKeys(table: string): string[];
}

The adapter backs this interface with the file system. Each table is a JSON file on disk.

// kv-database/src/adapter.ts
export class FileSystemKeyValueStore implements IKeyValueStore {
  constructor(private readonly dataDir: string) {
    if (!fs.existsSync(dataDir)) fs.mkdirSync(dataDir, { recursive: true });
  }

  private readTable(name: string): Record<string, string> {
    return JSON.parse(fs.readFileSync(this.tablePath(name), "utf-8"));
  }

  private writeTable(name: string, data: Record<string, string>): void {
    fs.writeFileSync(this.tablePath(name), JSON.stringify(data));
  }

  set(table: string, key: string, value: string): void {
    const data = this.readTable(table);
    data[key] = value;
    this.writeTable(table, data);
  }

  get(table: string, key: string): string | null {
    return this.readTable(table)[key] ?? null;
  }

  delete(table: string, key: string): void {
    const data = this.readTable(table);
    delete data[key];
    this.writeTable(table, data);
  }

  // ... createTable, deleteTable, listTables, listKeys
}

A team building a to-do application wants to use this database. Following P/A/S, they inject IKeyValueStore into their TodoList class and never reference the adapter directly.

// todo-app/src/todo.ts
import { IKeyValueStore } from "kv-database";

const TABLE = "todos";

export class TodoList {
  constructor(private readonly store: IKeyValueStore) {
    store.createTable(TABLE);
  }

  add(id: string, text: string): void { this.store.set(TABLE, id, text); }
  get(id: string): string | null      { return this.store.get(TABLE, id); }
  remove(id: string): void            { this.store.delete(TABLE, id); }
  list(): string[]                    { return this.store.listKeys(TABLE); }
}

So far so good. Now the team needs to write tests. They do not want to run against the real file system in every test, so they do what P/A/S prescribes: they write a simulator.

// todo-app/src/simulator.ts  (written by the consumer)
import { IKeyValueStore } from "kv-database";

export class TodoStoreSimulator implements IKeyValueStore {
  private tables = new Map<string, Map<string, string>>();

  createTable(name: string): void { this.tables.set(name, new Map()); }
  deleteTable(name: string): void { this.tables.delete(name); }
  listTables(): string[]          { return Array.from(this.tables.keys()); }

  set(table: string, key: string, value: string): void {
    this.tables.get(table)!.set(key, value);
  }

  get(table: string, key: string): string | null {
    return this.tables.get(table)!.get(key) ?? null;
  }

  delete(table: string, key: string): void {
    const t = this.tables.get(table)!;
    const value = t.get(key);
    if (value !== undefined) {
      t.delete(key);
      t.set(`__deleted_${Math.random().toString(36).slice(2)}`, value);
    }
  }

  listKeys(table: string): string[] {
    return Array.from(this.tables.get(table)!.keys());
  }
}

They write tests against it and everything passes.

// todo-app/tests/todo.test.ts
describe("TodoList", () => {
  let todos: TodoList;

  beforeEach(() => {
    todos = new TodoList(new TodoStoreSimulator());
  });

  it("retrieves a todo that was added", () => {
    todos.add("1", "Buy milk");
    expect(todos.get("1")).toBe("Buy milk");  // passes
  });

  it("returns null for a todo that was removed", () => {
    todos.add("1", "Buy milk");
    todos.remove("1");
    expect(todos.get("1")).toBeNull();         // passes
  });
});
PASS tests/todo.test.ts
  TodoList
    ✓ retrieves a todo that was added
    ✓ returns null for a todo that was removed

The tests pass. The team ships the application.


3. The Problem with Consumer-Written Simulators

Look carefully at what just happened. The TodoStoreSimulator was written by the team building the to-do application – a team whose expertise is to-do lists, not key-value stores. They built the simulator based on their own assumptions about how IKeyValueStore should behave. Nobody reviewed those assumptions against the actual adapter.

This is a structural problem, not a one-off mistake.

Every team that consumes IKeyValueStore will face the same situation. Each will build their own simulator. Each will encode their own assumptions. Each will write their own tests against those assumptions. The effort is duplicated across every consumer, and the correctness of each simulator is guaranteed by nothing more than the consumer’s best guess.

There is also a subtler risk. A consumer’s knowledge of an external system is almost always incomplete. It is built by reading documentation, inspecting the interface, and observing behavior in a handful of scenarios. Edge cases (the kind of behavior that only surfaces under specific sequences of operations) are systematically underrepresented. They are, by definition, the cases the consumer did not think of.

P/A/S prescribes that a simulator should exist but says nothing about who should write it, how it should be validated, or how it should be kept accurate as the real adapter evolves.


4. The Bug the Tests Did Not Catch

The delete method in TodoStoreSimulator contains a bug. Instead of removing the key, it renames it to a randomly generated name.

delete(table: string, key: string): void {
  const t = this.tables.get(table)!;
  const value = t.get(key);
  if (value !== undefined) {
    t.delete(key);
    // BUG: stores the value under a random key instead of deleting it
    t.set(`__deleted_${Math.random().toString(36).slice(2)}`, value);
  }
}

The tests did not catch this because they only called get() after remove(). Since get() looks up by the original key name, it correctly returns null – the original key is gone. But the data is not gone. It is sitting in the table under an invisible name, and list() would return it.

todos.add("1", "Buy milk");
todos.remove("1");

todos.get("1");   // null  ✓ -- original key gone
todos.list();     // ["__deleted_k3x9p2"] -- ghost entry, never tested

In production, against the real adapter, delete removes the key entirely. There is no ghost entry. The to-do application behaves correctly in production but incorrectly in tests – it is the simulator, not the application, that is wrong. And because the tests never called list(), no test ever observed the discrepancy.

This is the kind of bug that is easy to introduce and hard to detect. The simulator looks correct for the operations that were tested. It fails silently on the operations that were not.


5. What If the Provider Shared Their Tests?

The adapter’s test suite already validates the behavior of IKeyValueStore – including the behavior of delete followed by listKeys. If those tests could be run against the consumer’s simulator, the ghost-key bug would surface immediately.

One approach is for the provider to export their interface-level tests as part of the package, so consumers can run them against any simulator they write.

// consumer running the provider's exported tests against their own simulator
import { runContractTests } from "kv-database";
import { TodoStoreSimulator } from "../src/simulator";

runContractTests("TodoStoreSimulator", () => new TodoStoreSimulator());
FAIL
  ● IKeyValueStore contract -- TodoStoreSimulator › items › lists only keys that exist

    Expected: ["b"]
    Received: ["b", "__deleted_abix5ww9m3j"]

The bug is found. The ghost key surfaces in the first test that calls listKeys() after a deletion.

This is an improvement over the status quo. But it still has significant shortcomings.

The consumer still has to write the simulator. They still have to maintain it as the adapter evolves. The provider now has an additional public artifact to maintain (the exported test suite) which may not be compatible with the consumer’s testing framework of choice. And perhaps most importantly, this approach still places the burden of verification on the wrong party. The consumer must remember to run these tests, understand what they cover, and act on the results. Some will; others will not.

Sharing tests distributes the tools for validation without distributing the responsibility.

The insight is not that consumers need better tools to validate their own simulators. For most provider-owned contracts, consumers should rarely need to write foundational simulators at all.


6. Pay It Forward: The Provider Ships the Simulator

The Pay It Forward extension to P/A/S establishes a single new obligation: the provider ships the simulator.

The reasoning is direct. A port defines a behavioral contract. The adapter is one realization of that contract — the production realization. A simulator is another realization of the same contract — the test realization. Both are expressions of the same behavioral specification. Both are therefore owned by the same party: the provider who defined the contract in the first place. Assigning simulator ownership to the consumer is equivalent to asking the consumer to re-derive the specification from observation. That is both unnecessary and unreliable.

A provider following P/A/S/PIF delivers three artifacts, not two:

  1. Port – the interface contract
  2. Adapter – the production implementation
  3. Simulator – an in-memory implementation, written and maintained by the provider

The tests are not a public artifact. They remain inside the provider’s repository. What they do, before every release, is validate the simulator against the same suite that validates the adapter. If the simulator passes, it ships. If it does not, it does not.

Separating the two test layers

For this discipline to work, the adapter’s tests must be organized into two distinct layers.

Interface-level tests validate behavior that is visible through the port: that a value set under a key can be retrieved, that a deleted key no longer appears in listings, that tables can be created and destroyed. These tests say nothing about files, directories, or persistence. Any correct implementation of IKeyValueStore must pass them.

Implementation-level tests validate behavior that is internal to the adapter: that a JSON file is created on disk when a table is created, that data survives a process restart, that the storage directory is initialized if it does not exist. These tests are meaningless for an in-memory implementation. They belong to the adapter alone.

// kv-database/tests/adapter.test.ts  -- implementation-level, adapter only
describe("FileSystemKeyValueStore -- file system internals", () => {
  it("creates a json file on disk when a table is created", () => {
    store.createTable("t1");
    expect(fs.existsSync(path.join(dir, "t1.json"))).toBe(true);
  });

  it("survives a reload -- data is readable after a new instance opens the same directory", () => {
    store.createTable("t1");
    store.set("t1", "x", "y");
    const reloaded = new FileSystemKeyValueStore(dir);
    expect(reloaded.get("t1", "x")).toBe("y");
  });

  // ...
});

The interface-level tests are kept separately and run against both implementations.

// kv-database/tests/interface.test.ts  -- interface-level, runs against adapter and simulator
runContractTests("FileSystemKeyValueStore", makeFileSystemStore);
runContractTests("KeyValueStoreSimulator", () => new KeyValueStoreSimulator());

If the simulator has a bug (any bug that affects behavior visible through the port) it will fail these tests before it ever reaches a consumer.


7. The Bug Is Found Before It Ships

Consider what would have happened had the key-value database team been following PIF from the beginning. When they wrote their simulator and ran the interface tests against it, any bug equivalent to the one in the consumer’s simulator would have been caught immediately.

PASS tests/interface.test.ts
  IKeyValueStore contract -- FileSystemKeyValueStore
    ✓ retrieves a value that was set
    ✓ returns null after a key is deleted
    ✓ does not list a key after it is deleted
    ✓ lists only keys that exist
    ...
  IKeyValueStore contract -- KeyValueStoreSimulator
    ✓ retrieves a value that was set
    ✓ returns null after a key is deleted
    ✓ does not list a key after it is deleted
    ✓ lists only keys that exist
    ...

The provider’s simulator passes because the provider wrote it with full knowledge of the contract. The ghost-key bug never makes it into the shipped package. No consumer encounters it. No consumer needs to write tests to catch it.


8. The Consumer Stops Writing Simulators

With the provider’s simulator available as part of the package, the consumer’s situation changes entirely. They delete their own simulator and replace it with the one provided.

// todo-app/tests/todo.test.ts
import { TodoList } from "../src/todo";
import { KeyValueStoreSimulator } from "kv-database";

describe("TodoList", () => {
  let todos: TodoList;

  beforeEach(() => {
    todos = new TodoList(new KeyValueStoreSimulator());
  });

  it("retrieves a todo that was added", () => {
    todos.add("1", "Buy milk");
    expect(todos.get("1")).toBe("Buy milk");
  });

  it("returns null for a todo that was removed", () => {
    todos.add("1", "Buy milk");
    todos.remove("1");
    expect(todos.get("1")).toBeNull();
  });
});
PASS tests/todo.test.ts
  TodoList
    ✓ retrieves a todo that was added
    ✓ returns null for a todo that was removed

The consumer’s tests pass, and this time with a simulator that is actually correct. The consumer no longer owns any simulator code. If the adapter’s behavior changes, the provider updates the simulator and validates it against their own tests before shipping. The consumer re-runs their tests against the updated simulator and any behavioral change is immediately visible – not as a production incident, but as a test failure.


9. Beyond Faithful Reproduction: Extending the Simulator

A simulator that faithfully reproduces the adapter’s normal behavior is already a significant asset. But a provider-maintained simulator can go further. Because the provider controls the simulator, they can expose hook points that allow consumers to inject non-default behaviors – errors, delays, or other conditions that would be difficult or impossible to trigger reliably against the real adapter.

Fault injection

A real key-value store can fail. The disk fills up. The process loses access to the data directory. A network-backed store drops the connection. A to-do application that never handles these conditions will behave unpredictably when they occur in production.

The simulator can expose an error injection API: a way to tell it that the next call to a given operation should throw a specific error.

// kv-database/src/simulator.ts
export class KeyValueStoreSimulator implements IKeyValueStore {
  private injectedErrors = new Map<Operation, Error>();

  injectError(operation: Operation, error: Error): void {
    this.injectedErrors.set(operation, error);
  }

  private checkInjection(operation: Operation): void {
    const error = this.injectedErrors.get(operation);
    if (error) {
      this.injectedErrors.delete(operation);
      throw error;
    }
  }

  get(table: string, key: string): string | null {
    this.checkInjection("get");
    return this.tables.get(table)!.get(key) ?? null;
  }

  // ... other operations follow the same pattern
}

The consumer can now write tests that verify how their application behaves under failure, without needing a real store or a way to force it to fail on demand.

it("propagates a store error to the caller", () => {
  simulator.injectError("get", new Error("Store unavailable"));
  expect(() => todos.get("1")).toThrow("Store unavailable");
});

it("recovers after a transient error -- the next call succeeds", () => {
  todos.add("1", "Buy milk");
  simulator.injectError("get", new Error("Transient failure"));
  expect(() => todos.get("1")).toThrow("Transient failure"); // first call fails
  expect(todos.get("1")).toBe("Buy milk");                  // second call succeeds
});

Other hook points a provider might offer include latency simulation (to verify timeout handling or progress indicators) and call recording (to inspect what operations were performed and in what order). Latency simulation requires an asynchronous port interface, which is worth considering when designing the port if the real adapter performs any I/O.

The simulator’s role: integration and end-to-end testing

It is worth being precise about where the simulator fits in a test suite, because it is not a replacement for unit tests.

Unit tests for the to-do application test its own logic in isolation. They use a plain mock (a minimal object whose methods are controlled by the test) not the simulator. The mock verifies that TodoList makes the right calls with the right arguments and responds correctly to what the port returns. The mock does not need to behave like a real store; it only needs to return what the test tells it to.

// Unit test: verifies that TodoList calls the right port methods
it("delegates remove to store.delete with the correct table and key", () => {
  const mockStore: jest.Mocked<IKeyValueStore> = {
    createTable: jest.fn(), deleteTable: jest.fn(), listTables: jest.fn(),
    set: jest.fn(), get: jest.fn(), delete: jest.fn(), listKeys: jest.fn(),
  };
  const todos = new TodoList(mockStore);
  todos.remove("1");
  expect(mockStore.delete).toHaveBeenCalledWith("todos", "1");
});

Integration and end-to-end tests use the simulator. They exercise whole use cases from the application’s public API down through the port and into the simulator, validating that the application works correctly as a whole – not just that it makes the right calls in isolation.

// Integration test: verifies a full use case end-to-end
it("a todo added and then removed is no longer listed", () => {
  todos.add("1", "Buy milk");
  todos.add("2", "Walk the dog");
  todos.remove("1");
  expect(todos.list()).toEqual(["2"]);
});

The simulator enables both layers. Its faithful reproduction of the real adapter’s behavior makes integration tests trustworthy. Its hook points make it possible to exercise failure paths that would otherwise require elaborate test infrastructure – or a lot of luck.


10. Tradeoffs and Fidelity Limits

When a simulator is worth maintaining

A provider-maintained simulator is an investment. Writing it is straightforward for a clean, stable interface; keeping it faithful as the real adapter evolves is ongoing work. Not every adapter justifies that investment equally.

Simulators deliver the most value when the port interface is stable, the contract is deterministic (the same inputs reliably produce the same outputs), and the real adapter is expensive or slow to run in tests — network calls, disk I/O, external process startup, or cloud service fees.

Simulators become costly and unreliable when the real system has complex consistency semantics (distributed databases, event logs with ordering guarantees), when behavior depends on timing or concurrency, or when the protocol is large enough that a faithful in-memory implementation would amount to a second full implementation. In those cases, a scoped simulator that faithfully covers the behaviors most relevant to consumers — with explicit documentation of what it does and does not reproduce — is often more useful than an ambitious but imperfect one.

The fidelity boundary

A simulator validates contract-visible behavior: the operations defined on the port, the values they return, and the errors they can raise. It does not attempt to reproduce:

  • Consistency anomalies under concurrent access
  • Network partitions and split-brain scenarios
  • Timing-dependent behavior such as lock expiry or connection timeouts
  • Infrastructure-level concerns such as file system permissions or kernel signals

This is not a shortcoming of the pattern. It is a feature of the port boundary. The port defines what the application is allowed to observe. Behavior that cannot be observed through the port is behavior the simulator has no obligation to reproduce.

What follows from this is that a simulator-based integration test suite, however thorough, is not a substitute for testing against the real system. A simulator confirms that an application uses the contract correctly. Real-system tests confirm that the contract behaves correctly under production conditions. Both are necessary; they answer different questions.


11. Conclusion: The Pattern Composes

There is one more observation worth making. The FileSystemKeyValueStore adapter does not operate in a vacuum. It takes its own dependency: the file system. Reading and writing files is itself an operation on an external system, with its own observable behavior, its own failure modes, and its own contract.

The file system is a port. The operating system’s file system API is an adapter. And in a fully realized P/A/S/PIF architecture, the OS vendor (or the library providing file system abstractions) would ship a file system simulator alongside the real implementation, validated by the same interface-level tests before every release.

If that were the case, FileSystemKeyValueStore could be tested without ever touching real disk. Its dependency on the file system would be injected through a port, and the file system simulator would be provided by whoever maintains the file system abstraction. Any bugs in that simulator would be caught by the file system’s own interface tests, not discovered at runtime inside the key-value adapter’s test suite.

This is where the pattern stops being testing advice and becomes a theory about dependency boundaries.

Each adapter in a system is also a consumer of other adapters. Simulator ownership composes transitively through the dependency graph: every provider owns the simulator for the contracts they define, and every adapter — being itself a consumer of deeper ports — benefits from the simulators that its own dependencies provide. Where a faithful simulator can be built and maintained without unreasonable cost, the obligation to ship it belongs to the provider — not because the consumer cannot build one, but because the provider is the only party with the knowledge and the tests to verify one. When each layer in a dependency chain honors that obligation within its means, the entire system becomes progressively more testable in isolation, with growing confidence that each simulator faithfully represents the system it stands in for.

Where the contract is stable and the simulator can be maintained economically, the cost of that confidence is modest: write the simulator yourself, test it against your own contract, and ship it. Pay it forward.

December 13, 2020

Working with hashes in C++

This post was originally shared as a knowledge transfer session after I investigated a performance issue with some code that implemented a combined hash by using a simple addition operation

Introduction/background

Contrary to other programming languages, C++ doesn't have a root base type like Java and C# have the Object class.

In those 2 languages, there are two special methods in the Object class:

  • Equality comparison
  • Hash code calculation

These methods have certain properties that must hold true (although this isn't enforced by any of the compilers or runtime).

For two given instances A and B, if A == B then Hash(A) == Hash(B).

For example:

Person A("John", "Doe");
Person B("John", "Doe");
Person C("Jane", "Doe");

If A == B returns true, it's expected that Hash(A) and Hash(B) return the same value K

In the case where A != C, then there is nothing special about Hash(A) and Hash(C)

Extending the same concept for collections, if we have an ordered collection (either explicitly, like an array where items have an index, or explicitly, like a sorted collection), same thing is generally true:

var A = new int[] { 1, 2, 3 };
var B = new int[] { 1, 2, 3 };
var C = new int[] { 3, 2, 1 };

We have the exact same situation as above, while in the next case:

var A = new HashSet<int>() { 1, 2, 3 };
var B = new HashSet<int>() { 3, 2, 1 };

It should be true that A == B and the hashes of A and B are also the same (in this case the collection is not ordered). Ignore syntax and particularities of specific languages, this should be the expected behavior of == and hash

Since those two languages have the common root class Object, the methods Equals and HashCode are defined as virtual methods there. Each derived class that plans to use equality and hash-related operations, should override those methods. If you don't do that, you'll have to live with the default implementation of Equals and HashCode, which, without knowing the business logic behind your class, will end up just comparing memory addresses. If you don't override Equals for the class Person above:

  • A == A will return true
  • A == B will return false

Why does it matter? In two words: object identity.

Identity matters

Objects are considered identical not because they occupy the same memory address (because two objects can't occupy the same memory address), but because they represent the same entity. In which case, an object that is constructed with the data typed by a user in a form (like the Person with first name "John" and last name "Doe") is the same as the object that is constructed with the data read from a database. Otherwise you would never be able to do any checks with the objects that represent the same entity but are created differently.

The case for the hash

In addition to the equality comparer, hash structures rely on a small digest (the hash) being calculated for a bigger object. So if you want to add objects to a hash table you need to be able to calculate the hash values correctly and in an expected way. Still using the examples above, if you want to build a hash table whose key is the type Person, if the hash of A and hash of B are different values (let's say, you forgot to override the HashCode method), the "same person" will potentially end up in two positions in the hash. So an operation like:

var myHash = new Dictionary<Person, int>(); // int is the age
myHash[A] = 0; // person is born
Person.WriteToDatabase(A); // save to the database
…
var B = Person.ReadFromDatabase(…); // read previously stored
myHash[B]++; // happy birthday

throws an exception because myHash[B] doesn't exist.

But why do I need both Equals and HashCode? Isn't HashCode sufficient?

HashCode would be enough if the computer had enough memory to store all possible hashed values and the hash values were 100% evenly distributed, which is not true. There will be collisions where two different objects yield the same hash value. In those cases, how do you tell if they are the same object or not? By using the equality operation. If the objects have the same hash value but they are different, then they are treated as different entities and some collision management has to happen in the hash structure to give the different object a different home.

Summarizing

If you are creating a new class XYZ that you intend to use as a key to a hash table, you need to override both Equals and HashCode operations, otherwise the hash table will be either inefficient or simply not work.

What about C++?

Now let's look at the particular case of C++… C++ has no common ancestor to all objects, so while all classes do have a default operator== that can be overridden, that only solves half of our problem. Without a common hash method, there are no easy ways to get them to work with the hash structures.

Going back to C#, that language has a very interesting feature called "extension methods". Extension methods in C# can be used to overcome a good practice of OO design that is to make your classes final so that behavior is not overridden by the consumers (imagine an Authenticate class that is not sealed and a hacker can override the CheckPassword method…). Most of the time you won't be able to extend framework classes or 3rd party classes. In order to add new operations to existing classes, one possibility is to define these extension methods. They are a syntactic sugar for creating static methods with an implicit this parameter. So you can add a method WriteToLog to any class, even if that class doesn't define an abstract or virtual WriteToLog method.

The equivalent to an extension method in C++ is something like this:

template<>
struct TheExtensionYouWantToAdd {
// add your extension implementation here
};

The std::unordered_map container

This template is defined in header <unordered_map> with a couple optional parameters:

template<
    class Key,
    class T,
    class Hash = std::hash<Key>,
    class KeyEqual = std::equal_to<Key>,
    class Allocator = std::allocator< std::pair<const Key, T> >
> class unordered_map;

Note that there is a default std::hash type for the Hash template parameter defined in header <functional>.

template< class Key >
struct hash;

std::hash<T> is defined for all the primitive types plus the pointer to any type. It is also defined for some of the STL types, like std::string and std::unique_ptr (full list here). This template defines the operator()(const T&) so that you can call std::hash<T>()(something with type T). That's exactly the operation that is called by std::unordered_map to calculate the hash value.

You can also define a specific std::equal_to<T> if you are using a 3rd party class that didn't override the operator==

See that everything in C++ is planned towards supporting legacy code, since we can't just go ahead and rebuild all the C++ code to add a base Object class everywhere.

A practical (broken) example

How can things go wrong

Suppose you defined a class Person like the following and try to use Person as the key.

struct Person {
    Person(const Person& other) : Person(other.First, other.Last) {}
    Person(const std::string& first, const std::string& last) 
        : First(first), Last(last) {}

    const std::string First;
    const std::string Last;
};

int main()
{
    std::unordered_map<Person, int> myMap;
}

This is the error you'll get:

Error        C2280        'std::_Uhash_compare<_Kty,_Hasher,_Keyeq>::_Uhash_compare(const std::_Uhash_compare<_Kty,_Hasher,_Keyeq> &)': attempting to reference a deleted function        cppscratch        C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.28.29333\include\unordered_map        51        

Very helpful and user friendly… Not! It doesn't even tell you where the error happened.

How to start making things right

You need to define a hash operation for the Person type, like this (add this code between the struct definition and main):

template<>
struct std::hash<Person> {
    std::size_t operator()(const Person& person) const {
        // std::string already has an implementation for std::hash
        // so we can combine the hash values of first and last name
        std::hash<std::string> stringHasher;

        return stringHasher(person.First) + stringHasher(person.Last);
    }
};

Now the code builds fine! But it's still broken…

It is missing the equality operator and you'll notice that as soon as you attempt to insert anything in the hash:

int main()
{
    std::unordered_map<Person, int> myMap;
    myMap.emplace(Person{ "John","Doe" }, 1);
}

This will fail with:

Severity        Code        Description        Project        File        Line        Suppression State
Error        C2676        binary '==': 'const _Ty' does not define this operator or a conversion to a typeacceptable to the predefined operator        cppscratch        C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.28.29333\include\xstddef        91        

If you add the code below also before main():

template<>
struct std::equal_to<Person> {
    bool operator()(const Person& lhs, const Person& rhs) const {
        // std::string already has an implementation for std::equal_to
        // so we leverage that for first and last name
        std::equal_to<std::string> stringEqualTo;

        return stringEqualTo(lhs.First, rhs.First)
            && stringEqualTo(lhs.Last, rhs.Last);
    }
};

Then it finally succeeds! To compile… But there is a bigger problem lingering around…

Let's check the behavior of the hash function by looking at 3 different people:

int main()
{
    std::hash<Person> personHasher;
    std::equal_to<Person> personEqual;

    Person p1{ "John", "Doe" };
    Person p2{ "John", "Doe" };
    Person p3{ "Doe", "John" };

    std::cout 
        << "equal=" << personEqual(p1,p2) 
        << " p1 hash=" << personHasher(p1) 
        << " p2 hash=" << personHasher(p2) << std::endl;

    std::cout 
        << "equal=" << personEqual(p1,p3) 
        << " p3 hash=" << personHasher(p3) << std::endl;
}

That code outputs this:

equal=1 p1 hash=2540568255 p2 hash=2540568255
equal=0 p3 hash=2540568255

Wait a minute! The first two are expected, right? It's the equal/hash property: if A == B, then hash(A)==hash(B)

How about p3? They are different but have the same hash… Also expected, right? Because we can't guarantee that different values will have different hashes.

But then comes trouble… Looking closer to the Person class, it has an order. "John Doe" is not the same as "Doe John", so we should have taken the order into consideration also for the hash calculation!

And this is where C++ STL is missing a critical functionality!

First, let's look at what went wrong…

Let's recap our definition of the Person hash:

return stringHasher(person.First) + stringHasher(person.Last);

See that if First and Last are simply switched the hash will be the same. This will also happen if the First and Last names are different but they yield the same hash.

What could be the impact of that?

Hash collisions are expected, but we should try to avoid them as much as possible, because they are expensive to work around. Techniques like open addressing, buckets of keys, fix the problem but then they require a O(N) cost (maybe slightly lower, if you make the code much more complex), which throws the O(1) cost of the hash out of the window.

So it is super important to have a great hashing function in order to minimize those collisions! Not doing that can cause hard-to-find performance issues.

How is this solved in C++?

While both C# and Java offer a standard hash combination routine, so that you can easily combine multiple hash values, C++ STL doesn't provide a hash combination utility function!

There are alternatives… We can define our own hash combinator, or we can use one from the STL competitor: boost. If you don't want to take another dependency (on boost), here's a convenient variadic template:

template <typename T, typename... TS>
inline void HashCombine(std::size_t& hashValue, T value, TS... values)

A fully functional example

We only need to replace our naïve hash combination (using addition) with the utility function:
template<>
struct std::hash<Person> {
    std::size_t operator()(const Person& person) const {
        std::size_t result = 0;

        HashCombine(result, person.First, person.Last);

        return result;
    }
};

Now running the same example as before, we get this:

equal=1 p1 hash=861162551 p2 hash=861162551

equal=0 p3 hash=1964666581

See that p3 hash is no longer the same as p1 and p2.

Problem solved!

Appendix – Show me the code!

inline void HashCombine(std::size_t& hashValue)
{
}

template <typename T, typename... TS>
inline void HashCombine(std::size_t& hashValue, const T value, const TS... values)
{
 static std::hash<T> hasher;

 // The constants below are arbitrary, but we follow the same logic as boost::hash_combine,
 // but with different rotation values 13 and 19.
 // (magic constant comes from here https://en.wikipedia.org/wiki/Tiny_Encryption_Algorithm)
 hashValue ^= hasher(value) + 0x9e3779b9 + (hashValue << 13) + (hashValue >> 19);
 HashCombine(hashValue, values...);
}

SeeSharpr by SeeSharpr is licensed under CC BY-SA 4.0