PostgreSQL: A Reusable SQL Library of Domains

Published in

The Opinionated Software Architect

10 min readFeb 2, 2023

Introducing a PostgreSQL library (lib) that boosts SQL code reuse, improves readability, and efficiently enforces data constraints for PostgreSQL domains.

Introduction

Introducing the first of our reusable SQL libraries for your projects — the PostgreSQL domains SQL library, now available in the lib repository managed by SQL Package Manager (sqlpm).

In this article, we’ll explore the fundamental concept of PostgreSQL domains and shed light on the innovative design choices that went into creating the lib repository.

Although we have touched on it, a more in-depth architectural analysis of when and where to use constraints is best covered in a separate article.

The lib repository is incomplete and additions are greatly appreciated!

The Opinionated TLDR

The Opinionated Architect’s TLDR:

Consistently enforce software constraints as close to the persistence layer as possible. If invalid data can be persisted, then the data is not reliable.
Leverage PostgreSQL constraints as much as possible.
Using domains (reusable constraints) lead to improved SQL code readability and reuse.
Avoid premature optimization. Use constraints liberally and if scalability becomes an issue, move constraints to the middle layer: removing them from the database layer.
Favor defining the meaning of NULL in domains rather than allowing for nullable columns.
Leverage or place SQL in libraries such as the one defined in lib.

Where To Constrain Data

PostgreSQL boasts robust constraint capabilities that safeguard data integrity and consistency. Implementing database constraints is the optimal solution for ensuring data validity.

Merely relying on middle-tier data validation can leave the database exposed to storing invalid data. To guarantee the reliability and accuracy of the data retrieved from the database, it is crucial to enforce constraints at the database level. By consistently using database constraints, organizations can have full confidence in the integrity of their stored data.

PostgreSQL offers a feature called domains to establish reusable constraints.

Constrain Then Optimize

Constraining data at the database layer may incur some overhead during SQL transactions as constraints must be applied prior to data persistence. While it may appear to scale better from an architectural standpoint to enforce constraints in other tiers (middle tier, an api gateway, etc.), this approach leaves the database vulnerable to storing bad data.

Instead of prematurely optimizing, prioritize constraint enforcement at the database layer first. If performance issues arise, consider reducing constraints at the database layer and relying more on the other tiers.

SQL Domains: Key Benefits

Centralized Management: Domains centralize constraint and type management, allowing modifications to be made in one location and easily propagated throughout the schema. However, depending on the changes being made, propagation in some RDBMS may lead to downtime. Plan accordingly.
ORM (Object-Relational Mapping) Libraries: Many ORM libraries, such as Hibernate or Entity Framework, can automatically map PostgreSQL domains to corresponding data types in the middle tier, enforcing constraints and validation rules defined in the domain on middle-tier objects.
Improved Readability: Domains make code more comprehensible and emphasize column intent.
Constraint Enforcement: Domains allow for defining constraints on stored values, such as value ranges or specific formats, ensuring consistent and valid data in columns.
Improved Maintainability: By creating a domain, you can define a specific type, constraints, and default value once and reuse it across multiple columns and tables, reducing code and enhancing schema maintainability.”

Design Approaches on The Lib Repository

Column Data Purpose

Columns in a database can serve two main purposes:

Computer-facing: Data in these columns is used for internal operations and is not meant to be shown to the user. Examples include unique IDs, foreign keys, and other values for linking and referencing data. Avoid altering values in computer-facing columns, such as primary keys.
User-facing: Data in these columns is intended to be both displayed to and editable by the user, serving to convey information. Examples include display names, descriptions, and other values that present data to the user.

In defining domains, we take into consideration the intended purpose of each column.

Nulls, Constraints, and Domains

When defining columns in a database, it’s important to consider whether NULL values should be allowed. While allowing NULLs can be useful when a value is unknown or undetermined, it’s often more appropriate to specify a non-null default value or a meaningful definition of NULL in a domain.

For example, consider a domain for representing the total value of a quote in US dollars. The total is calculated based on the price of each item and the number of items for each line item. The domain for this column can be defined as follows:

Decimal places: To ensure that the quote total is accurate to the nearest cent, the domain should specify two decimal places.
Default Value: Since the meaning of the total is well-defined, a default value of 0.00 can be used for quotes without any line items. This allows the domain to be set as NOT NULL.
Maximum Value: To ensure that the total is within a reasonable range, a maximum value of 999 billion dollars can be specified.
Positive Value Only: To reflect the reality that quotes are usually positive, the domain can require that the total is a positive number (although there may be rare cases where a quote is negative).

The final definition for such a domain would be:

DO $$ BEGIN
  CREATE DOMAIN lib.currency_p2 AS decimal(17,2) NOT NULL
    CHECK (VALUE >= 0);
EXCEPTION
    WHEN duplicate_object THEN null;
END $$;

COMMENT ON DOMAIN lib.currency_p2 IS
'A currency with a required positive value under 999 billions dollars.';

When possible, define domains so they don't require a NULL value (see What is The Deal With Nulls).

Easier Querying: With a non-null constraint, there is no need to add extra checks for missing values in queries. This makes the code more concise and reduces the likelihood of errors.
Simplification of Code: By defining a column as NOT NULL, it can be relied upon that a value will always be present. This eliminates the need for additional checks in the code to handle missing values, making the code more readable and maintainable.
Improved Data Integrity: When a column requires a value, it is guaranteed that every row in the table will have a value in that column. This ensures that data is complete and accurate, reducing the chance of data errors and inconsistencies.
Better Performance: Indexes on non-null columns are more efficient, leading to faster query performance. In addition, storage is optimized because null values take up extra space in the database.
Simpler Constraint Management: Non-null constraints are easier to manage than complex checks for missing values. This helps maintain data integrity and reduce the likelihood of bugs in the code.
Consistent Data Model: By requiring values in all columns, a consistent data model is established. This makes it easier to write and maintain code and reduces the likelihood of errors.

It is important to note that while allowing NULL values in columns can be beneficial (improved storage space for example), the topic of architecting a database with regards to NULL values is a complex one and is best explored in a separate article.

Sql Domains in Practice

Below is an example of using the domains provided by the lib repository.

Defining the tag.tag table without using domains:

-- The tag.tag table without domains
CREATE TABLE IF NOT EXISTS tag.tag (
  tag_id uuid NOT NULL,
  enabled boolean NOT NULL DEFAULT true,
  sort_order int NOT NULL DEFAULT 100000,
  color_hex varchar(7) NULL
  CONSTRAINT shared_color_hex_invalid_expected
    CHECK (color_hex ~ '^#([a-fA-F0-9]{6}|[a-fA-F0-9]{3})$'),
  tag_label_short varchar(32) NOT NULL,
  tag_label varchar(128) NOT NULL,
  tag_description varchar(4096) NOT NULL DEFAULT '',
  CONSTRAINT tag_tag_id_pk PRIMARY KEY (tag_id)
);

Defining the tag.tag table using domains:

CREATE TABLE IF NOT EXISTS tag.tag (
  tag_id lib.key_lhtag,
  enabled boolean NOT NULL DEFAULT true,
  sort_order int NOT NULL DEFAULT 100000,
  color_hex lib.color_hex NULL,
  label_short lib.label_short NOT NULL DEFAULT '',
  label lib.label NOT NULL,
  description lib.description NOT NULL DEFAULT '',
  tagid_range int4range NULL,
  CONSTRAINT tag_tag_id_pk PRIMARY KEY (tag_id)
);

Lib Domains

The following SQL domains are defined in the lib repository.

Identifiers

An identifier column, such as a primary key, is a unique identifier for each row in a database table. It serves as the reference point for establishing relationships with other tables through foreign key constraints. It is used for computer-related purposes and is not intended to be displayed to the user.

Purpose: computer-facing, foreign key
Data Type: uuid, auto increment, string, bigint, int, smallint
Indexed: Always
Nullable: Never
Domains: lib.key_uuid, lib.key_str_36, lib.key_str_64, lib.key_bigint, lib.key_int, lib.key_smallint,
Updatable: Expected never to change once set
Notes:

The value of an identifier column is expected to remain unchanged once it is set, making it a reliable way to reference a specific row of data. Cascade updates (ON UPDATE) are supported but can lead to database performance issues.
In some cases, such as lookup or enumerated value tables, the identifier may not be an auto-incrementing number. Also see Symbol/Key/Hash and Slug below.

Symbol/Key/Hash

A Symbol, Key, or Hash column serves as a unique identifier for a row of data within a table, but with the added advantage of being human-readable. This makes it more user-friendly, as opposed to a purely numeric or uuid value.

Symbol columns are used to establish relationships between tables through foreign keys and make it easier for developers to interact with entities in code. For example, using the symbol available to reference an item instead of its numeric id (e.g. Item.find_by_key(:available) instead of Item.find_by_id(7)) can enhance the readability and maintainability of the code.

Purpose: computer-facing, user-facing, foreign key, (find_by)
Data type: string
Indexed: Always
Nullable: Never
Domains: lib.key_symbolUpdatable: Expected never to change once set

Slug

A Slug column is a user-friendly identifier that is used to identify a unique row of data within a table and can be used to establish relationships between tables through foreign keys.

It is part of a URL and refers to the specific identifier that appears at the end of the URL. For example, in the URL “https://example.com/articles/the-importance-of-water", the-importance-of-water is the slug.

Purpose: computer-facing, user-facing, URL
Data type: string
Indexed: Recommended
Nullable: Never
Domains: lib.key_slug_128
Updatable: Expected never to change once set
Notes:

For more information, checkout What Is a Slug?

Hierarchical-Tree Tags

Tagging data associates a piece of information with one or more tags, helping to classify and organize the data. Hierarchical-Tree tags use a tree-like structure to create path-like tags. For example, a tag might look like device.display.monitor. These tags help to quickly and easily categorize and find data in various applications, such as image, audio, text classification, and search engines.

Purpose: computer-facing, user-facing
Data type: ltree
Indexed: Recommended
Nullable: Never
Domains: lib.key_htag, lib.key_lhtag
Updatable: Expected never to change once set
Notes:

Postgresql supports these using ltrees.
The lib.key_htag and lib.key_lhtag can also be used as non-hierarchical tags.

Label, Label Short, Title, and Description

Label/Label Short: A label, also known as a label short, is a brief, human-readable name for an item or entry. It is typically used in interfaces where space is limited and a concise identifier is needed, such as in drop-down list boxes or radio buttons.

Title: A title is a longer, more descriptive version of a label. It provides more context and detail about the item or entry, making it easier for users to understand its purpose and function.

Description: A description provides even more detailed information about an item or entry. It can include background information, usage instructions, or other relevant details. Descriptions are often used in tooltips, documentation, or other information-rich interfaces where more space is available.

Purpose: user-facing
Data type: string
Indexed: Dependent on Use Case
Nullable: Potentially but frowned upon
Domains: lib.label, lib.label_short, lib.title, lib.description
Updatable: Yes
Notes:

Try and requre labels, titles, and descriptions for entities. If they are optional, then try to default to an empty string unless the column needs to be unique.

Name

A name, usually in the form of a human-readable string, serves as the primary identifier and distinguishes entities from each other within the database.

Usage: user-facing
Data type: string
Indexed: Dependent on Use Case
Nullable: Potentially but frowned upon
Domains: lib.name
Updatable: Yes
Notes:

Try and requre a name for entities. If names are optional, then try to default to an empty string unless the column needs to be unique.

Percent

Represents a percentage value.

Purpose: user-facing
Data type: numeric, decimal(precison, rounding)
Indexed: Dependent on Use Case
Nullable: Potentially but frowned upon
Domains: lib.percent_7_1, lib.percent_7_1_null, lib.percent_7_2, lib.percent_7_2_null, lib.percent_7_3, lib.percent_7_3_null, lib.percent_7_4, lib.percent_7_4_null.
Updatable: Yes

Currency Value (Scalar) and Type (Enumeration)

Currency is often represented as a symbol and a scalar value, such as “$34.99”. To represent currency accurately, two pieces of information are usually required: the type of currency and the number of units.

A third piece of information, rounding, may also be necessary as storing currency values may require different decimal places. For example, Vietnamese Dong has no decimal places, while USD is represented in cents (two decimal places).

In cases where the price or cost of an item is less than the smallest unit of a currency, more decimal places may be needed. Additionally, converting from one currency type to another may require using more decimal places.

Rounding a currency value after a calculation can vary greatly based on the business domain. It is recommended to perform rounding within the database for consistency across an organization. As different languages approach rounding differently, performing rounding in the middle tier or front-end may result in different rounded values.

Purpose: user-facing
Data type: numeric, decimal(precison, rounding)
Indexed: Dependent on Use Case
Nullable: Potentially but frowned upon
Domains: lib.currency_0, lib.currency_0_null, lib.currency_2, lib.currency_2_null, lib.currency_4, lib.currency_4_null
Updatable: Yes
Notes:

Currency Scalars are defined in the lib repository, while Currency Types are defined in the iso repository.
Storing currency values as integers with a scalar applied (e.g. $4.25 as 425) is possible, but may lead to confusion regarding the scalar applied.
The use of composite types in PostgreSQL can simplify the representation of multiple currency values in a single row, but this may require additional knowledge for middle-tier and front-end consumption and may result in redundant currency type information.
For example: The representation of multiple currency values, such as product cost, price, and discount, in a single row will likely be consistent in terms of currency type, leading to the use of a single currency_type row for an order_item or order. The approach will depend on the specific business domain and requirements.

Conclusion

Leveraging domains in database design can significantly improve code readability and help ensure consistency across all software tiers — front-end, middle tier, and back-end.

By defining clear and concise domains for each field, the database structure can be made more intuitive and easier to understand for all members of a development team. This leads to improved collaboration, less errors and bugs, and a more efficient development process.

The careful and deliberate use of domains can greatly enhance the quality of database design, leading to improved code quality and a better overall development experience.

Try out the PostgreSQL library (lib). Additional domains are greatly appreciated.