Databases

Table of Contents

Relational normal forms

Functional dependencies

Functional dependencies are a generalization of keys. This theory defines when a relation is in normal form.

functional dependency: if two rows agree on a value in one column, they must also agree on the other column e.g. here, functional dependency is INAME → PHONE, because phone number only depends on the instructor intuitively:

FD table example

This is read as "INAME (functionally, uniquely) determines PHONE".

What is a functional dependency?

Keys are functional dependencies.

During database design, only unquestionable conditions should be used as functional dependencies.

It's usually bad database design if schema's relations violate normal form. If it's violated, data is stored redundantly and information about different concepts is intermixed.

Working with functional dependencies

The database designer is not interested in all functional dependencies, but only in a representative functional dependency set that implies all others.

Implications of functional dependencies:

Computing the cover: for given set of attributes, see which they imply through FDs. Extended cover with those, and repeat.

Checking whether a → β is implied by a functional dependency set:

  1. computer cover α⁺ of α:
  2. check if β ⊆ α⁺:
    • set of functional dependencies F implies α → β iff β ⊆ α⁺_F

example:

Example to check functional dependency

How to determine keys

Determining a minimal key (slides 64-70)

determining all minimal keys (slides 71-88)

Determinants

determinant: non-trivial, minimal functional dependency

{A1, ..., An} is determinant for {B1, ..., Bm} if:

Consequences of bad database design

usually if table contains an functional dependency that's not implied by a key, it's a sign of bad database design.

leads to:

problem is that general FDs are not supported by relational databases. so you have to transform them into key constraints (database normalisation).

Normal forms

Normal form types:

Normalisation algorithms can construct good relation schemas from attributes and functional dependencies. When an ER model is well designed, resulting derived relational tables will automatically be in BCNF.

First normal form (1NF):

Boyce-Codd Normal Form (BCNF)

Example of checking BCNF

Splitting relations

If table R is not in BCNF, we can split it into two tables. You split based on violating FD.

Table decomposition:

Splitting has to be []{#Relational normal forms-Normal forms-Boyce-Codd Normal Form (BCNF)-Splitting relations-lossless}lossless so that you can reconstruct original relation by a join.

Decomposition theorem: split is guaranteed to be lossless if intersection of attributes of new tables is a key of at least one of them.

It's always possible to transform relation into BCNF by lossless splitting. The resulting schema can always represent all previously possible states, but it may be more general and allow states that do not exist in the old schema.

With computable columns, splitting the relation is not the right solution - instead, define a view with the computed column.

A good decomposition should guarantee preservation of FDs:

Synthesis

Determining canonical (minimal) set of FDs:

BCNF synthesis (relation R, set of FDs for R):

  1. Determine canonical set of FDs
    • e.g. D→A, E→D, E→C, B→E
  2. Maximise rhs of FDs
    • {D}+ - D = {A}
    • \({E}_{-E}^+\) = {D,C,A}
    • \({B}_{-B}^+\) = {E,D,C,A} ({B} is the minimal key in this case)
  3. Split on violating FDs. For each FD, remove the rhs from relations and add a new relation, with the lhs of that FD being the key.

Third normal form (3NF)

retains all FDs, so more popular than BCNF. If we leave table in 3NF, we have non-key constraints - the FDs that are not implied by keys.

Synthesis

Produces lossless decomposition of relation into 3NF that preserves FDs.

  1. Determine canonical set of FDs
    • e.g. D→A, E→D, E→C, B→E
  2. Merge FDs with same lhs and create relations from them
    • R1(D, A)
    • R2(E,C,D)
    • R3(B,E)
  3. Check if any of relations has key of original relation. If not, create a new relation with attributes of the minimal key.
    • In this case, R(A,B,C,D,E); R3 has B, so don't need to do anything.
  4. For all pairs of created relations: are they contained in another relation? If yes, remove.

MVD & 4NF

Multivalued dependencies (MVDs)

Constraints that give a necessary and sufficient condition for lossless decomposition. They lead to fourth normal form (4NF).

Multivalued dependency NAME ⤅ PROG_LANG means that set of vales in in column PROG_LANG associated with every NAME is independent of all other columns. i.e. there's an embedded function from NAME to sets of PROG_LANG

The MVD holds if, whenever two tuples agree on NAME, one can exchange their PROG_LANG values and the resulting tuples are in the same table.

MVDs always come in pairs. For relation R(A1,...,An, B1,...,Bm, C1,...,Ck) these multivalued dependencies are equivalent:

Every FD is also a MVD.

Fourth Normal Form (4NF)

A relation is in in 4NF if every MVD A1,...,An ⤅ B1,...,Bm is either trivial, or implied by a key.

If a relation is in 4NF, it's also automatically in BCNF.

Normal forms & ER design

If a 'good' ER schema is transformed into the relational model, the result will satisfy all normal forms (4NF, BCNF, 3NF). If a normal form is violated, there's a flaw in the input ER schema.

In the ER model, the entity has to be split in case of a violation.

Violations of BCNF can also be due to wrong placement of an attribute.

If an attribute of a ternary relationship only depends on two of the entities, it's a BCNF violation.

Why normalize?

Denormalization

The process of adding redundant columns to the database to improve performance.

This leads to the reintroduction of (some) anomalies, but you avoid huge amounts of joins.