Database

The most basic form of database is appending to the bottom of the logs. This allows an $O(1)$ but reads will be $O(n)$ . In order to speed things up, we can add indexes which might potentially slow down reads but speed up writes.

In order to prevent running out of space, we can separate the data into segments. When the segment reaches a certain threshold, we will start another segment. From time to time, we will perform compaction to combine segments together.

Indexes

Index 1: Hash Index

We have a in-memory hashmap that maps the keys to the bit offset of the key. It is faster than a B-Tree or SSTable because writing is $O(1)$ , and merging old segments prevents fragmented data.

However, we cannot perform range query as the keys are random, and the in memory hashmap needs to fit into memory space. If there are too many keys, we might run out of space.

Index 2: Sorted String Table

Similar to the hash indexes, except that the keys are sorted. Compare the segments side by side and take the smaller key. If duplicate keys exist, take the one from the later segment. The good thing is that we do not need to store all the keys anymore, and we can have sparse keys.

Index 3: Log-Structured Merge Tree (LSM Tree)

Create an instance of LSM tree and write to it.
When the tree reaches a certain threshold, start writing to the disk and create a new instance.

To serve a read request, we search the LSM tree first then disk then older segment. However, this is inefficient and we can use a Bloom Filter to optimise the query.

Size Level compaction — Newer and smaller SSTables are merged into older and larger SSTables.

Levelled-tier compaction — Key range is split up into smaller SSTables and older data is moved into separate levels, which allows compaction to proceed more incrementally and use less space.

Index 4: B-Tree Index

B-trees break the database down into fixed-size blocks or pages and read or write one page at a time. A page can refer to another page on the disk like a linked list.

We start with the root of the B-Tree which is a page and each child is in charge of a range of keys. We keep traversing until we reach a page containing individual key (leaf page) that contains the value or reference to the value.

Branching Factor: Number of references per page

Update value for existing key: Search for leaf page containing key, change the value in the page and write back to it.

Create a new key: Find the range that encompasses the new key and add it to that page. If there is not enough free space in the page, split the page into two halves and update parent page.

Basic underlying operation is to write data on the whole disk. Some pages require multiple operations to be overwritten, and if the database crashes after several index have been written, we might get corrupted data ⇒ Write Ahead Log (WAL)

Why LSM	Why BTree
In B-Tree, some of the memory might not be used. Therefore, space might be wasted.	Very inefficient to search for data in LSM
In B-Tree, you have to write data twice — one to the page and one to WAL

OLTP vs OLAP

Online Transaction Processing	Online Analytical Processing
Mainly used for business transactions	Mainly use for analytical purpose
Written by users	Stream processing or bulk import
Read a few rows at a time	Aggregated over large amount of data

ACID Properties

Acronym	Description
Atomicity	Guarantees to see uncommitted data or committed data, not partial
Consistency	Application notion of consistent state
Isolation	Describe what happens when transactions happen in isolation
Durability	When there is a hardware fault, data that is committed is not affected

Isolation Levels vs Read Phenomenas

Isolation Level	Dirty Read	Non-Repeatable Read	Phantom Read
Read uncommitted	Yes	Yes	Yes
Read committed	No	Yes	Yes
Repeatable Read	No	No	Yes
Serializable	No	No	No

The higher the serialisation level, the higher the accuracy of data. However, this incurs a performance cost on the database due to reduced concurrency.

Dirty Reads

Transaction A reading uncommited data by Transaction B. The problem with this is that if the Transaction B aborts, the data by transaction A will be inaccurate.

Dirty Read

On the other hand, dirty write happens when both transactions write to the same data, and transaction A overwrites uncommitted data by transaction B. Read Committed isolation guarantees that there is no Dirty Reads.

Non-Repeatable Read

In the following scenario where we decide to transfer 500 from account 1 to account 2 in transaction 2 and reading both accounts in transaction 1, transaction 1 will return a total of $900 for both accounts even though the total will eventually be $1000.

This does not violate the read committed principle because in Transaction 2, we read from account 1 the uncommitted value and we read from account 2 the committed value. We are not reading the partially updated values.

This phenomena is known as non-repeatable read. It is unacceptable in 2 conditions: (1) backup of data because it will mix old and new data together and (2) analysing data because it will lead to garbage data.

Snapshot Isolation

Take a snapshot of the uncommitted values before transaction. Ignore any writes by other transactions, whether they are ongoing, aborted or committed. This can be done by implementing write locks.

Multi-Version Concurrency Control

For each row, you have 2 additional fields called created_by and deleted_by. When a transaction inserts a new row, it will fill up the created_by field and when it deletes a row, it will fill up the deleted_by field. When it updates a row, it actually deletes and create a new row.

When we are guaranteed that there are no other transactions accessing the row, we will delete it.

Since transaction 1 takes a snapshot of the uncommitted values prior to beginning, it will see both accounts as $500 as it ignores the writes written by transaction 2.

Lost Update

Lost Update usually occurs in a read-modify write cycle. When we read a value, the value might have change prior to modifying it, causing a data race.

Solution 1: Atomic Write operations

Put a lock when the object is read and only unlock when it is updated (cursor stability) or force all atomic operations to execute on single thread.

UPDATE counters SET value = value + 1 WHERE key = 'foo';

Solution 2: Explicit Locking of rows

BEGIN;
	SELECT * FROM figures WHERE name = 'robot' AND game_id = 222 FOR UPDATE;
COMMIT;

Solution 3: Automatic Detection

Once detected, we will abort the transaction and retry.

Solution 4: Compare and Set

Allowing an update to occur if the value has not changed since we last read it. If the current value does not match, retry the read-modify-write cycle.

Solution 5: Replication Configuration

Last Write Wins (LRW)
Atomic operations
Allow several conflicting versions and use application code and separate data structure to resolve and merge the versions.

Write Skew

2 doctors on the sick shift want to take sick leave but they can only do so if the other doctor is available. If they query the database concurrently, the select criteria will return true for both doctors and since condition are both true, they will update their status to False.

However, this will violate the first condition. This is known as a write skew. The only good way to solve this is implementing serialisable isolation level because we cannot guarantee to lock the rows using FOR UPDATE.

Serialisation

Two Phase Locking (2PL)

If a transaction wants to read an object, it must first acquire the lock in shared mode. Several transactions are allowed to hold the locks in shared mode simultaneously, but if a transaction already has exclusive lock, they must wait.
If a transaction wants to write to an object, it must first acuire the lock in exclusive mode. Only 1 transaction may hold the lock in exclusive mode at a time.
If a transaction reads then write to an object, it can upgrade the shared lock to exclusive lock.
2 phases: Acquire lock during transaction and release lock during the end of the transaction.

Operating Systems Distributed Systems