Databases – Oddities

  • SQL is standardized, but many implementations do not follow it,
    • some things were implemented long before they were standardized;
  • Notable differences:
    • data types,
    • automatically generated keys,
    • constraints.
  • I will concentrate on the most common differences.

Relational Database Systems

  • Commonly Used RDBMS
    • Adaptive Server Enterprise (Sybase)
    • (IBM) DB2
    • Firebird
    • MariaDB
    • MySQL
    • Oracle Database
    • PostgreSQL
    • (Microsoft) SQL Server
    • SQLite

Database Interfaces

  • To use a database system, you need an interface:
    • Proprietary:
      • usable for a single database system,
      • more functions,
      • best performance.
    • Generic:
      • better support for standards
      • usable from different applications and for different DBS
      • ODBC, JDBC, PDO, ADODB, …

Database Interfaces cont.

  • Every interface has functions to:
    • connect to a database (connect or a constructor),
    • execute a query (exec, execute, prepare),
    • fetch a result of the SELECT query (fetch, open, next, seek) – work with resultset
    • retrieve an error result (error, lastError).

Pagination

  • Uses the keywords LIMIT and OFFSET on Postgres;
    • Other systems TOP, BOTTOM, ROWNUM keywords.
  • SELECT * FROM person ORDER BY id_person LIMIT 10 OFFSET 5
  • Remember to include ORDER BY! Otherwise you’ll have random results.
  • Criticism:
    • If it is not on the first page, search is better.
    • Maybe considered user unfriendly (combined with bad search).
  • Other application: “TOP 10 best customers”
    • How many results should such a query return?

Database Structure – Create

CREATE TABLE table_name (
    column data_type [NOT NULL] [DEFAULT value] [PRIMARY KEY]
    [, column data_type ...]
)
CREATE TABLE contact (
    id_contact serial PRIMARY KEY,
    id_person integer NOT NULL REFERENCES person (id_person),
    id_contact_type integer REFERENCES contact_type (id_contact_type),
    contact character varying(200) NOT NULL
)

Database Structure – Create

  • CREATE statements are practically used only during imports and exports.
    • Because of different data types, it is impossible to use them for migrations.
    • Use special tools for migrations.
  • Database Systems have very complex configurations:
    • Can cause big changes in behavior;
    • E.g. MySQL / MariaDB storage engines (MyISAM × InnoDB)
  • Different implementations of auto-increment

Data Dictionary

  • The structure of every DB is stored in another DB.
  • DD is a shared database information_schema: invisible, accessible, read-only.
  • To show tables in information_schema:
    • SELECT * FROM information_schema.tables WHERE table_schema = 'information_schema'
  • To show columns of a table:
    • SELECT * FROM information_schema.columns WHERE table_name = 'person'
  • If unsupported, there are commands like DESCRIBE, SHOW
  • Modifications through commands ALTER, RENAME, DROP, MODIFY, ADD

Data Types – String

  • character_varying(size) / varchar(size) / nvarchar(size)
    • a standard string limited by size
    • size in bytes for ANSI strings
    • size in characters for Unicode strings
  • char(size) / character(size) / nchar(size)
    • a string limited by size
    • filled with spaces, not very useful
    • LIKE compares without spaces
  • text / ntext
    • “unlimited” string
    • size of a cell in gigabytes

Data Types – Whole Numbers

  • bigint – 8B – -263..263
    • 263 = 9 223 372 036 854 775 808
  • integer – 4B – -231..232
    • 232 = 2 147 483 647
  • smallint – 2B – -215..215
    • 215 = 32 768
  • whole numbers:
    • signed (with a sign) – allows negative numbers
    • unsigned (without a sign) – allows only positive numbers
    • do not combine them!

Data Types – Decimal Numbers

  • real / float / double:
    • a floating-point decimal;
    • inexact! – e.g. 2 != 6 / 3 !
    • real range 1e-37..1e37
    • double range 1e-307..1e308
  • number / numeric / decimal(precision, scale):
    • a fixed-point decimal;
    • precision – maximum number of digits;
    • scale – number of digits after the decimal point.
  • money -263..263 = 92 233 720 368 547 758.08

Data Types – Timestamp

  • Date and time is stored as timestamp (an unambiguous point in time).
  • Usually number of milliseconds / microseconds from 1.1.1970.
  • Range from 4713 BC to 294276 AD.
  • Conversion to a string date is very complex, never ever do it yourself!
  • Timestamp fixes leap days, leap seconds, DST…
  • Timestamp should contain a time zone (use UTC whenever possible).
  • Does not allow to store incomplete dates (‘January 2016’), use separate columns.

Data Types – Date Types

  • datetime / date / time:
    • In mysql datetime is sort of same as timestamp
  • timestamp, time, date cannot be used to store a date / datetime interval:
    • never assume that an hour has 3600 seconds (can be from 0 up to 7200);
      • because of leap days, seconds;
      • because of DST;
    • interval – store the interval of date time values.

Data Types – Date Alternative

  • You may encounter using native application format:
    • a column data type is an integer
    • use the timestamp of the application language:
<?php
...
$db->execute(
    "UPDATE person SET birth_day = :birthDay",
    [':birthDay', time()]
);

Data Types – Date Alternative

  • pros:
    • simple, for one application without problems
  • cons:
    • non-atomic (1NF), impossible to select a part of the date
    • ordering works fine though
    • impossible to store or compute an interval
    • brings back application and database dependency
    • unreliable when more applications use the database

Data Types – Binary Data

  • bit / boolean / binary / bytea – a single value,
    • BLOB / image – binary data,
    • BLOB – binary large object (LOB, LO).
  • Details of working with LO are dependent on the database interface.
    • Escaped data may be sent within the SQL string (limited size),
    • Binary data may be sent via special functions (‘unlimited’ size).
  • Use this to store medium sized files, better than storing them in a file system.

Automatically Generated Key

  • The most commonly used type of a primary key:
    • an abstract record identifier,
    • independent on outside conditions,
    • auto-increment or sequence,
    • simpler to use than compound keys (all parts must be used).
  • There should be a natural key defined too (for the end-user).
  • No database follows the standard fully (historical reasons).

Auto-increment in PostgreSQL

  • The column is assigned a special data type serial when creating the table.
    • CREATE TABLE table (id serial NOT NULL,
    • creates a sequence to generate the values;
    • assigns a default value to:
      • nextval('sequence_name')
    • cannot be exported exactly (the column is an integer).
  • To obtain the last value, call:
    • SELECT currval('sequence_name'),
    • or $db->lastInsertId('sequence_name') in PDO.
  • If a value is inserted explicitly, the sequence won’t update!
    • Subsequent inserts will probably fail!

Currval or Max?

  • To obtain the last value of an inserted row, you must always use predefined functions.
  • Correct:
    • SELECT currval('sequence_name')
    • $db->lastInsertId('sequence_name')
    • or:
SELECT id_person FROM person
WHERE first_name = 'X' AND last_name = 'Y'
    AND nickname = 'Z'

Currval or Max? cont.

  • Incorrect:
    • SELECT MAX(id) FROM table
    • Does not work in concurrent environment.
  • Obtaining the value must be thread-safe.
    • Hard to test – requires multiple users in rapid succession.
  • The last-insert-value is tied to the database session (connection).

Integrity Cons