Database

Best Practices 


Four core principles

  1. Accuracy
  2. Accessibility
  3. Minimum Redundancy
  4. Meet Expectations

 

Key Points of Database design

  1. Properly verify all the business requirements.
  2. Choose the right database type based on the business needs.
  3. Consider every viewpoint during database planning.
  4. Apply normalization rules.
  5. Build a transparent structure.
  6. Divide proper information into tables.
  7. Turn information items into columns.
  8. Follow the naming convention.
  9. Define primary keys
  10. Define foreign keys and set table relationships.
  11. Define indexes wherever needed.
  12. Design stored procedures wherever needed.
  13. Design views wherever needed.
  14. Design triggers wherever needed.
  15. Document everything.
  16. Think about the database server. It should be separate from the application server.
  17. Think about the read/write operations, and then think about the separate database.
  18. Read/Write operation when usage is heavy.
  19. Think about the speed
  20. Try to make all the things in database tables dynamic so it will help us make modifications at the time of development.

 

Documentation

  • Every project has database documentation in DB DOCS TOOL to view the schema.
  • Every project documentation link should be password protected. Follow this Guideline.

 

Naming Conventions

Best Practices of Naming Conventions for database, tables, columns, fields.

  1. Use all lowercase names.
  2. Use the shortest possible names that are still intuitive.
  3. It must be unique (at least for that particular database).
  4. CANNOT contain spaces, if more than one word, use an underscore (_).
  5. CANNOT contain special characters (/, @, !, &, \, etc.)
  6. Simple solution: use only alphanumeric characters and underscore
  7. Begin name with alpha char
  8. Relationships: use active or passive verbs that indicate a relationship.

 

Table Design

  • Prefix table names should be decided. It will create a unique identity for the table.
  • Table names should end with an 's' (e.g., products, customers).
  • Each table must have a primary key
  • Each table must have a foreign key.
  • The below four fields need to be compulsory. created_by updated_by created_datetime updated_datetime

 

Column Design

  • Each column name must be unique within its table.
  • Try to make the column name as minimal as possible.
  • If a column references another table’s column, name it ID (example: The Customers table has an xx_id column, the Orders table should have a xx_id column).
  • Columns with default values should not allow NULLs.
  • Set the appropriate data type and size for each column.
  • Use TEXT or LONGTEXT data types for storing large textual data wherever needed.
  • For Unicode values and multilingual data, use TEXT data types. 'Emojis like 🌟 can be stored using utf8mb4.'
  •  

Indexes

Primary Key Indexes

Primary key indexes are automatically created when a primary key is defined on a table. They ensure uniqueness and fast retrieval of records based on the primary key.

Use Case:

  • Fast retrieval of records using primary key values.
  • Enforcing unique constraints.

 

Unique Indexes

Unique indexes ensure that all values in the indexed columns are distinct.

Use Case:

  • Enforcing uniqueness constraints on columns, such as email addresses or usernames.
  • Preventing duplicate entries in a column. For e.g CREATE UNIQUE INDEX idx_user_email ON Users(user_email);

 

Non-Unique Indexes

Non-unique indexes improve the performance of queries by allowing quick lookups based on the indexed columns.

Use Case:

  • This index is used for speeding up search queries.
  • Enhancing the performance of SELECT queries that use WHERE clauses. For e.g CREATE INDEX idx_email ON users (email);

 

Composite Indexes

Composite indexes (multi-column indexes) are created on multiple columns. They are useful when queries filter on multiple columns.

Use Case:

  • Queries that filter or sort based on multiple columns.
  • Enhancing the performance of queries with complex WHERE clauses involving multiple columns. For e.g CREATE INDEX idx_user_name_email ON users (username, email);

 

Full-Text Indexes

Full-text indexes enable efficient searching of text data. They are particularly useful for searching large text fields.

Use Case:

  • Implementing full-text search functionality in applications, such as searching for keywords in articles, documents, or product descriptions. For e.g CREATE FULLTEXT INDEX idx_article_content ON articles (content);

 

Partial Indexes

Partial indexes are created on a subset of rows in a table, based on a condition specified in the WHERE clause.

Use Case:

  • indexing frequently queried subsets of data, such as active users in a user table. For e.g CREATE INDEX idx_active_users ON users (last_login) WHERE active = TRUE; SELECT * FROM users WHERE active = TRUE ORDER BY last_login;

 

Clustered Indexes

Clustered indexes determine the physical order of data in a table. Each table can have only one clustered index.

Use Case:

  • Optimizing range queries, as data is stored in the same order as the index. For e.g CREATE CLUSTERED INDEX idx_order_date ON orders (order_date);

Optimization

1) Avoid writing the raw query in Node.js and Laravel. Using an ORM can help write optimized and secure queries more easily.

2) Optimize queries

  • Avoid SELECT *: Fetch only the columns you need.

  • Avoid the N+1 Query Problem: Use joins or batch queries instead of writing queries in a loop.

3) Application Caching

  • Client Libraries: Use libraries like node-cache or redis for implementing caching in your application.

  • node-cache : for smaller data

  • Redis : for bigger data

4) Using PostgreSQL's Built-in Caching

PostgreSQL itself has various caching mechanisms built in, such as:

  • Shared Buffers: Caches frequently accessed disk blocks in memory.

  • Work_mem: Used for internal sort operations and hash tables before writing to disk.

  • Effective_cache_size: helps the query planner to estimate the amount of disk cache available from the operating system and PostgreSQL.

To optimize these settings, you can modify the postgresql.conf file:

  • shared_buffers = 1GB

  • work_mem = 64MB

  • effective_cache_size = 3GB

5) Views:

  • It needs to be used when you need to do multiple joins frequently.

  • A database view is a virtual table that is based on the result of a SQL query.

  • Views can simplify complex queries by encapsulating all the data in a single view.

6) Database Caching Engines: 

  • If our requirement matches this requirement, then we need to use this tool.

  • Pgpool-II is a middleware tool that sits between PostgreSQL clients and PostgreSQL servers. It provides several features to enhance the performance, scalability, and availability of PostgreSQL databases. Some of the key features include connection pooling, load balancing, automatic failover, and query caching.

7) Db connection pooling

  • Use DB connection pooling for database connection management.

  • Max Connections: Set the max property in the pool configuration to control the maximum number of connections in the pool. The optimal number depends on your application and database server capabilities.

  • Idle Timeout: The idleTimeoutMillis setting controls how long a connection can be idle before being closed. This helps free up resources.

  • Error Handling: Always handle errors when working with pools to ensure connections are properly released back to the pool.

  • Resource Cleanup: Ensure that connections are released back to the pool after usage, typically done in a final block.

8) DB trip reduction techniques :

  • The store procedure will help to add multiple database queries in a single transaction, so you will perform multiple queries in a single trip.

9) Triggers : 

Triggers are specialized stored procedures that automatically execute in response to certain events on a particular table or view in a database. You need to use them based on your requirements. it will help to reduce the database calls.

  • Trigger Events: The events that activate the trigger. Common events include:

    • INSERT: When a new row is added to a table.

    • UPDATE: When an existing row is modified.

    • DELETE: When a row is removed from a table.

  • Trigger Timing: Defines when the trigger's actions are executed relative to the triggering event:

    • BEFORE: The trigger fires before the operation is executed.

    • AFTER: The trigger fires after the operation is executed.

    • INSTEAD OF: For views, this allows the trigger to replace the usual operation.

  • Trigger Scope: Specifies whether the trigger operates at the row level (FOR EACH ROW) or statement level (FOR EACH STATEMENT).

  • Real use case : we have two tables. One is users, and the second is user_logs. When any record is updated in user, then automatically log entries will be added to user_logs via trigger. It will help to reduce DB calls.

Security

  • Do not use the default username, like "postgres." You need to create your own username with a strong password.
  • Use roles to manage permissions for groups of users. This simplifies the process of granting and revoking access.
  • Use strong, unique passwords for database accounts.
  • Sensitive and user data are encrypted for GDPR (General Data Protection Regulation) compliance.
  • Databases should not be publicly accessible. Permission is given to specific IP addresses.
  • Use VPNs for secure remote access to the database.

     

Backup and Recovery

  • Regular Backups: Implement a regular backup strategy to prevent data loss. 

    • Setup a lifecycle manager in AWS.

    • For other hosting providers, use an automated backup service.

    • If no option is there, then you need to take a manual backup.

  • Automated or manual: The backup database should be restored for testing purposes. Sometimes the backup database will not be restored due to errors in the backup file.

  • Disaster Recovery Plan: 

    • Develop and test a disaster recovery plan to ensure quick recovery from data loss incidents.

0