Database Systems |

Introduction

Data – some values obtained by observation or measuring.
Information – interpretation of data and relations between them.
- Data which lower entropy of the systems.
- Repeated / duplicated data bear no information.
Database – a generic structure for storing data;
- could be even a file system.
Database system – an application managing data in a database.
Interface – means for connecting systems together.

The company information system is composed of unrelated sub-systems (SIS):
- each system has its own application code,
- each system has its own data store.
Format and structure of data are tied to each sub-system requirements:
- therefore they are incompatible,
- where there is incompatibility, there is isolation,
- where there is isolation, there is redundancy,
- where there is redundancy, there is inconsistency.
No one wants inconsistency.

Isolation and incompatibility – data can be used only by the ‘owning’ application.
- Data is unreadable (unknown or proprietary format).
- Data is unintelligible (no documentation).
Redundancy – one piece of data in multiple locations.
- Do not confuse with backups (no master – copy).
Inconsistency – occurs when redundant data is incompletely updated.
- Civil service is a good (bad ?) example.
Ad-hoc queries are very expensive (substantial effort or new application).
Difficult to ensure data integrity and security.

Integrity – the state of the data (database) when all data is complete and correct.
What is correct is determined by integrity constraints, e.g.:
- Age is a positive integer.
- Every person is of some age (her age cannot be empty).
- Every person has an address (his address cannot be empty).
Integrity constraints are not easy!
Integrity violation occurs usually as a result of bad data sanitization.

Data is stored and managed in one place.
Data files have set structure, hidden to outside world.
Data may be accessed only through the DBS interface.
Database system takes care of concurrent access of multiple users, permissions and data integrity.
Database system operations are optimized to lower time complexity.

OLTP – Online Transaction Processing:
- the foundation of all – processing of daily operations,
- every information system must have this,
DWH – Data Warehouse:
- analytical systems, analytical queries,
- large amount of read data, almost no data written.
Others:
- special analysis – text processing, image processing, predictions, etc. - Different technologies, but SQL language is used in most DBS. - Focus on OLTP, because that is the starting point.

A set of resources to model and describe data.
The output of DM is a database schema – declaration of the database structure (database template).
Different tools:
- functions,
- graphs (from graph theory),
- relations (sets),
- objects.
Database is something which can hold data.
Database schema is the description of a database structure.
Object data model and relation data model are two approaches to the same problem.

Data is organized in tree-like structures.
The relation between records is parent-child.
A child has one parent.
One parent can have more children.
It is not very suitable for general-use databases, special uses:
- DNS, LDAP,
- file systems (links ignored),
- XML (partially HTML).

An extended hierarchical model.
Each record (vertex) can be connected to any number of different vertices.
Handles relations 1:N, M:N and cyclic relations.
Very universal and flexible, no usable implementations.
Considered as a Network model with limitations:
- a file system (links not ignored),
- HTML DOM (Document Object Model).

Describes data with relations (something like a table, will explain later).
Designed around 1970 (E. F. Codd, in IBM).
Built on hard maths – relational algebra.
Relational database system – RDBS.
Gave birth to and uses the SQL language for querying.
Widespread:
- DB2 (IBM), Firebird, MySQL,
- Oracle Database, PostgreSQL, SQL Server (Microsoft), etc.

Transaction – one action, optionally composed of multiple operations (e.g. insert a person together with contact information and meetings, …).
Atomicity – either all operations in a transaction must be performed, or none of them (rollback).
Consistency – before and after the transaction, the database is in a consistent state, during the transaction it may not be (imagine buying something).
Isolation – concurrently running transactions behave as if they are running sequentially.
Durability – a finished transaction is permanent.
ACID is what common sense expects from a database.

Virtually all ACID databases use the SQL language or some SQL dialect or a similar language.
The other option is BASE:
- Basically Available, Soft state, Eventual consistency;
- if a record is not updated any more, all queries to that record will eventually return consistent results;
- unusable for an ordinary information system;
- required for massive distributed services;
- somewhat related to NoSQL databases (key-value databases) = associative array, graph, document, …);
There are also columnar and key-value databases which may by either ACID or BASE.

May one database system contain multiple databases?
What is the direct outcome of redundant data in an information system?
Is is possible to run two transactions at once in an ACID database?
Is an information system only software?
Is data better than information?
How is information transmitted?
Does every DBS have to use the SQL language?
Should accounting use an ACID or a BASE database ?
What are differences between the object data model and the relational data model?