Databases

Table of Contents

Transactions

transaction: a sequence of actions we want to perform on a database

should be atomic: either run fully, or not at all. this avoids concurrency problems like:

For this, we need to get some ACID:

transaction: a list of actions

Schedules

scheduler decides execution order of concurrent database access

schedule is list of actions from set of transactions ('plan on how to execute transactions'). order in which 2 actions of transaction T appear in schedule must be same as order in T.

Serializability

serial schedule: if actions of different transactions are executed one after another (e.g. all of T2, then all of T1)

serializable schedule: if its effect on database is same as that of some serial schedule

actions in schedule conflict if they

conflicts may cause schedule to not be serializable

conflict types:

we can swap actions of different transactions if actions are non-conflicting.

conflict equivalent schedules: if they can be transformed into each other by swapping non-conflicting, adjacent transactions.

conflict-serializable: if conflict equivalent to some serial schedule

check it with a precedence graph:

Runtime serializability strategies

serializability during runtime: system doesn't know which transactions will run, and which items they'll access

strategies:

Pessimistic: lock-based, two phase locking

transactions must lock objects before using them

types:

2 phase locking protocol:

Deadlock handling

2 PL has the risk of deadlocks where both transactions wait for each other indefinitely. need to detect deadlock.

detection with Wait-for-Graphs

detection with timeout

Cascading rollbacks

cascadeless schedule

recoverable schedule:

schedules should always be recoverable. all cascadeless schedules are recoverable.

Strict 2 phase locking

same as 2PL, but a transaction releases all locks only when it's completed (commit/rollback). it's cascadeless, but still has deadlocks.

Preclaiming 2 phase locking

all needed locks are declared at start of transaction. therefore, no deadlocks. however, not applicable in multi-query transactions (where queries might depend on results of previous queries)

Granularity of locking

there's a tradeoff. the more specific your locking is (database, vs table, vs row level), the higher concurrency you have, and the higher overhead

multi-granularity locking - decide granularity of locks held for each transaction depending on characteristics of transaction

intention locks (do not conflict with each other):

an intention lock on coarser level of granularity means there is S/X lock on finer level of granularity.

before a granule g can be locked in S/X mode, the transaction has to obtain an IS/IX lock on all coarser granularities containing g

after all intention locks are granted, transaction can lock g in the announced mode

levels of granularity: database → table → row

Optimising performance

for each query in log:

How do read- and write-sets of queries intersect? What is chance of conflicts?

When you understand the query workload, you can:

Isolation levels

some degree of inconsistency may be acceptable to get increased concurrency & performance

SQL-92 levels:

SQL-92 isolation levels

phantom row problem: T1 locks all rows, but T2 inserts new row that isn't locked.

solutions:

many applications don't need full serializability, selecting a weaker but acceptable isolation level is part of database tuning.

Optimistic concurrency control

hope for the best, only check that no conflicts happened when committing. this saves locking overhead.

three phases:

  1. Read phase: execute transaction, but don't write data to disk. collect updates in transaction's private workspace
  2. Validation phase: when transaction wants to commit, DBMS test whether execution correct, and abort if needed.
  3. Write phase: transfer data from private workspace into database.

Phases 2, 3 have to be in non-interruptible critical section (val-write phase).

Validation

typically implemented by maintaining:

Backward-oriented optimistic concurrency control (BOCC)

Forward-oriented optimistic concurrency control (FOCC)

Multiversion concurrency control

with old object versions around, read-only transactions never need to be blocked

issues:

snapshot isolation: