Database Tables

In es-entity every Entity gets 2 tables - the index and the events table. This section will explain the rational behind that.

The Events Table

The events table always has the same fields:

CREATE TABLE user_events (
  -- The entity id
  id UUID NOT NULL REFERENCES users(id),
  -- The sequence number of the event
  -- Starting at 1 and incrementing by 1 each event
  sequence INT NOT NULL,
  -- The 'type' of the event (corresponding to the enum variant)
  event_type VARCHAR NOT NULL,
  -- The event data serialized as a JSON blob
  event JSONB NOT NULL,
  -- The 'event context'
  -- additional metadata that can be collected out of band
  -- only populated if 'event_context' attribute is set on the EsEvent
  context JSONB DEFAULT NULL,
  -- The time the event was recorded
  recorded_at TIMESTAMPTZ NOT NULL,
  -- Unique constraint to ensure there are no duplicate sequence numbers
  UNIQUE(id, sequence)
);

In fact we could persist all events to a global table with that schema but partitioning the events per Entity gives us some benefits when querying (like read performance and referential integrity).

Intuitively you might think this is all we need as we can very easily query all the events for a specific Entity:

SELECT * FROM user_events WHERE id = $1 ORDER BY sequence

This is correct if we know the id of the Entity we are looking up. But it becomes a lot more tricky when we want to do a lookup on a non-id field.

Assuming the Event-enum looks like this:

#![allow(unused)]
fn main() {
pub enum UserEvent {
    Initialized { id: u64, name: String },
    NameUpdated { name: String },
    EmailUpdated { email: String },
}
}

and we want to lookup a user by email, the query would quickly become a lot more complicated. Lets consider the naive query:

SELECT * FROM user_events WHERE event->>'email' = $1;

This doesn’t work as it only gets a single event - but we want all events for that Entity.

SELECT *
FROM user_events
WHERE id = (
  SELECT id
  FROM user_events
  WHERE event->>'email' = $1
  LIMIT 1
)
ORDER BY sequence;

This also doesn’t work because perhaps the event that was found wasn’t the latest EmailUpdated event in the Users history. But we want to get the user who’s email is currently $1. So it could find some false positives.

When iterating with ChatGPT the next suggestion is:

WITH latest_email_updates AS (
  SELECT id, MAX(sequence) AS max_sequence
  FROM user_events
  WHERE event_type = 'email_updated'
  GROUP BY id
),
latest_emails AS (
  SELECT e.id, e.event->>'email' AS email
  FROM user_events e
  JOIN latest_email_updates leu
    ON e.id = leu.id AND e.sequence = leu.max_sequence
  WHERE e.event_type = 'email_updated'
),
target_user AS (
  SELECT id
  FROM latest_emails
  WHERE email = $1
)
SELECT *
FROM user_events
WHERE id = (SELECT id FROM target_user)
ORDER BY sequence;

This query might execute what we want but it still has issues. The worst one being that we are leaking a lot of domain knowledge into the query. Specifically the presence and shape of the EmailUpdated event is encoded into the query. Preferably the specifics of the Event-schemas would only need to be known on the domain side encoded in the EntityEvent enum.

Also the whole query is quite inefficient. Sure we could add an index on the event->>'email' field but that would introduce more implicit coupling. Also what if we wanted something like a UNIQUE constraint on the email - but still allow emails swapped multiple times.

The Index Table

Enter the index table. The index-table is a table that hosts 1 row per Entity with the columns populated by the latest values. In that sense it looks very similar to a table that might hold the entire state of the Entity in a typical update-in-place persistence strategy. The difference is that we only include columns that we want to index for fast lookup or some kind of constraint like UNIQUE or REFERENCES. In that sense it is purely an optimization and does not represent the entire state of the Entity - for that you must load all the events.

CREATE TABLE users (
  id UUID PRIMARY KEY,
  created_at TIMESTAMPTZ NOT NULL,

  email VARCHAR UNIQUE
);

Now the query simplifies to:

WITH target_entity AS (
  SELECT id
  FROM users
  WHERE email = $1
)
SELECT e.*
FROM user_events e
JOIN target_entity te ON e.id = te.id
ORDER BY e.sequence;

As a result the query is much simpler and we are no longer leaking any domain information. We just have to ensure the index table gets updated atomically as we append the events to the events table.

Keyboard shortcuts

ES Entity Framework

Database Tables

The Events Table

The Index Table