Databases are collections of data organized for easy storage and access. In this article, we’ll break down their types, common use cases, and systems to manage them.
The overview will help you understand which database might be best for your needs and why.
Types of databases and DBMS: what are they best for?
How are databases and database managing systems classified?
In programming, there are many different tasks and operations related to the storage and use of data, coming in various forms. The need to adapt to this diversity, along with the development of Internet technologies, gave rise to multiple types of databases. The first repositories, which emerged with the advent of magnetic disks in the 1950s, are as different from modern versions as a Neanderthal is from a hipster.
Today, we primarily classify databases depending on the data model they support — or how data is arranged within them. The distinct types here are:
- relational or SQL databases that store information in related tables (also called structured data);
- non-relational or NoSQL databases where semistructured and unstructured data can be arranged as key-value pairs, graphs, wide columns, or documents;
- object-oriented databases reflecting data structures in object-oriented programming;
- hierarchical databases employing tree-based structures to keep information; and
- network databases exploiting flexible graph-like structures.
Additionally, databases are often categorized into
- row-based and columnar DBs — by data storage in tables;
- OLTP (online transaction processing), OLAP (online analytical processing), and time series DBs — by purpose; and
- centralized and distributed DBs — by deployment.
Some categories overlap, so the same database can belong to several groups. For instance, a traditional relational database is also row-based, OLTP-optimized, and centralized.
Databases are often mentioned alongside database management systems (DBMSs), so they might seem the same. But actually, a DMBS is a database plus software added to work with the data. Some DBMSs work only with a particular type of database, while others — known as multi-model — can accommodate different data models. For instance, ArangoDB supports document, key-value, and graph data models (all NoSQL).
Almost any DBMS can perform four basic data operations described by the acronym CRUD: create, read, update, and delete.
DBMS can be open source which means publicly accessible and often (but not always) free to use, or proprietary, available under license only (and usually paid).
To learn more about database management systems, you can read our article comparing the most commonly used DBMSs.
Relational databases (SQL databases)
Relational databases, or SQL databases, handle only structured data organized into a collection of related tables with rows and columns.
A row is an individual record containing complete information about a data item (for example, a book's title, author, number of pages, year of publishing, etc.). A column contains attributes of a single type for every row (for example, titles of all books in a table). Each row has a unique ID called a primary key.
Relationships between tables are maintained using foreign keys. They act as a "reference" to the primary ID from another table, linking related records and allowing complex queries across multiple tables.
PK = primary key, FK = foreigner key. All the rest is simple.
For instance, an online retail system might have tables for customers, orders, and products. A customer table would have rows for each customer and columns for details like name, address, and email. An orders table could reference the customer table using a foreign key, allowing the system to retrieve all orders associated with a specific customer.
One of the essential advantages of all relational databases is that they support ACID properties: Atomicity, Consistency, Isolation, and Durability—that define reliable database transactions. An ACID transaction ensures that all operations within a transaction are treated as a single, indivisible unit, and the whole chain of operations either fully succeeds or fails. Basically, it’s an all-or-nothing rule.
Imagine you are shopping online. Selecting a t-shirt and putting it in the shopping cart seems like a piece of cake, but it is a transaction involving a few database operations. If a single operation fails, the whole transaction will be rolled back. As a result, the t-shirt you were looking for is either put in the cart or not—there’s no middle ground. This guarantees reliability in complex systems.
Relational databases also ensure data integrity via certain constraint mechanisms. Say we have a table of authors and a table of posts. We can introduce a constraint that prevents records of authors who have published at least one post from being deleted. In this way, we guarantee that there will be no unattributed posts on our website. In non-relational DBs, such constraints are impossible.
Advantages:
- Data integrity
- ACID compliance
- Widely adopted, with many tools and resources
- Almost everybody in IT has at least basic knowledge of SQL (Structured Query Language) because it’s very simple.
Limitations:
- Struggle with handling vast data volumes
- No support for semistructured and unstructured data
- Not suitable for horizontal scaling (we’ll look into this issue in the next section)
- Rigid with schema changes
Best for transactional applications, such as eCommerce websites, banking systems, enterprise resource planning (ERP), and customer relationship management (CRM) systems, where data consistency, integrity, and complex querying capabilities are critical.
Popular data management systems for relational databases include MySQL and PostgreSQL (both open-source), Oracle Database, and Microsoft SQL Server.
NoSQL databases
NoSQL databases, also called non-relational databases, emerged in the late 2000s. As web applications and social media platforms expanded, so did the demand for databases that could scale horizontally, support semi-structured and unstructured data, and handle high-velocity data streams. To address new challenges, Google represented Big Table in 2006, and Amazon developed Dynamo DB in 2007. Another year later, Facebook opened the source code of Cassandra DB.
Let's clarify what horizontal scaling means and why it is essential when working with Big Data.
To load large amounts of data to a relational database that uses complex tables with strict schema, you must add resources (CPU, RAM, storage, etc.) to an existing server — to build it “vertically,” so to speak. Horizontal scaling means just adding other services. If we tried to do so in a relational DB, joining related tables for a complex query would take a lot of time. In contrast, a non-relational DB stores data flexibly in self-contained units, so data portions are more independent and can be easily handled by separate machines.
In addition to quickly increasing the data volume, horizontal scaling improves performance. When many users query the database, this increased workload is spread across different servers.
However, ACID compliance has to be sacrificed for horizontal scaling, at least to some extent. Data consistency is difficult to maintain if users can simultaneously update, delete, or add records to the database across multiple locations. No one can guarantee transaction integrity under such conditions. But some NoSQL DBs partially or completely support ACID properties using special mechanisms.
Advantages:
- Fleхible, handle various data structures
- Easily manage large, distributed datasets
- Support various data models (e.g., document, key-value, graph) tailored to specific use cases
Limitations:
- No built-in support for ACID properties
- Fewer standardized tools and practices compared to SQL databases
Best for applications with constantly evolving data models, optimized for read/write operations in big data and real-time apps.
Although some enthusiasts proclaimed the death of relational databases with the advent of NoSQL, they were too hasty. Time has shown that these are just different instruments, like a hammer and an axe; they will never replace one another. Now, let's look closer at an axe and why it's the perfect tool for chopping wood.
Key-value databases
Key-value DBs are the simplest type of NoSQL DBs. They store data as key-value pairs, where a unique key is associated with a specific value. It is similar to a library catalog, where each book has an index indicating its location: room, shelf, and a number on the shelf.
It takes a key as an input, uses a hash function to quickly identify the storage based on that key, and retrieves the required data in milliseconds.
Key-value database.
Best for applications with massive data sets and very simple data, where high performance and low latency are paramount. Examples are caching and storing user sessions (where the session ID is the key and the session data is the value), message brokers, leaderboards and counting in computer games, IoT applications handling sensor data with timestamps.
Popular DBMSs for key-value databases include Memcached, Redis (open-source multimodal), Amazon DynamoDB (multimodal), and Riak (open-source).
Graph databases
Graph databases are built to handle large amounts of data with complex relationships. In this context, the term "graph" doesn't refer to charts or drawings; in mathematics, a graph is a network of objects (nodes) connected by links (edges). For example, the H2O molecule model from a chemistry textbook or a map of railroads is a graph.
Ironically, it's not relational databases that excel at managing relationships between entities—graph databases are actually the ones designed for that.
Graph database looks like a map, doesn’t it?
If you need to find the best flight route from Pasadena to Bangalore, a relational database would have to sift through numerous tables to find the right combination.
In contrast, a graph database already has the connections between airports stored, making the search much faster. Graph query languages, such as Apache TinkerPop Gremlin or SPARQL, enable you to retrieve details like the number of hops between nodes, the optimal paths between nodes, and the properties of specific nodes. High performance is maintained even as new entities are added and relationships deepen or branch out.
Best for any applications where analyzing and managing relationships is crucial: social networks, recommender systems, and fraud detection through suspicious financial transactions. Since graph databases can represent complex links between words in a sentence or objects in an image, they are also well-suited for machine learning.
Popular DBMSs for graph databases include Neo4j, JanusGraph (both open-source), and Amazon Neptune. ArangoDB and OrientDB are multi-model databases that support graphs, among other non-relational DBs.
Wide-column databases
Wide-column database.
Wide-column databases store data in rows and columns, and they can look like tables, but they are not. Unlike relational databases, wide-column databases use a flexible schema where each row can keep any number of different columns.
Say one row might contain the following columns: “Name: Harry Potter; House: Gryffindor; Pet: Hedwig (Owl),” while the second row would look like this: “Name: Hermione Granger; House: Gryffindor; Favorite Subject: Arithmancy; Books Borrowed: Advanced Rune Translation.”
Unlike relational databases, which read rows one by one to search for the information being queried, querying a wide-column database means searching first by columns or groups of columns, called a column family.
After selecting the needed column or column family, the DBMS uses a partition key to retrieve the relevant row or rows within that column. The sort key orders rows within a partition to refine a search's results. So, the keys for the query to find Taylor Swift would be lined up in the following sequence: “Performer”/”2”/”Album ID,” where “Performer” is a column name.
New columns can be added or deleted to individual rows without affecting the entire dataset. The data in a wide-column DB can be effectively compressed, saving disk space, because you don’t have to store empty cells.
Best for use cases where scalability and performance are critical, such as business intelligence applications, data warehousing, and IoT apps.
Popular DBMS for wide-column databases include Apache Cassandra, HBase, and ScyllaDB (all open-source).
Document-oriented databases
An example of a JSON-formatted document is in a document-oriented database.
Document-oriented DBs store semi-structured data as documents in JSON, BSON, or XML format, but mainly JSON. The latter is a lightweight data-interchange format that is easy to read and write. A document contains data about a single object along with its associated metadata. Inside a document, data is organized into field-value pairs, where values can take on various forms, such as strings, numbers, dates, arrays, or even nested objects. Texts, images, and videos (unstructured data) can also be stored as values.
If there is a nested object, then a document is a hierarchical structure: a vanity bag in a suitcase, a suitcase in a trunk, a trunk in a car. This is how a nested object looks in JSON format: {"address": {"country": "USA," "state": "Nevada"}}. In this case, an address is a key and a nested object simultaneously.
Let’s look at the simple use case for this flexible, easy-to-use data model. For example, if you're developing a content management system where each document (such as an article or blog post) may have a different structure—one with an author, another anonymous, but with tags, and another with embedded media—a document-oriented database would allow you to store all these variations. Queries can be performed directly on the content of documents.
Best for content management systems for user-generated content, social networks, catalogs, user profiles, gaming applications, and sensor management systems.
Popular DBMS for document-oriented databases include MongoDB (source-available), Couchbase (has an open-source version), Amazon DocumentDB (proprietary), and RavenDB (open-source).
Object-oriented databases
Object-oriented databases store data in a way that directly mirrors the structure used in object-oriented programming languages like Java or C++. You can create and access this database with the same programming languages. This results in less coding and eliminates data compatibility issues between the database and the app logic.
For those who have not encountered object-oriented programming languages, here is an obscenely simplified description of how it works. Anything can be an object: a person, a product, a business operation. Similar objects belong to classes. Attributes describe the state of objects in the same class; methods, or functions, describe behavior.
When creating a driving game, a developer first designs the class "cars" with properties like name, color, speed, and fuel level. Then, a cheerful red car called Lightning McQueen is born as an object belonging to that class. Later, when modeling actions, a programmer can simply use methods to say in the code: "McQueen, accelerate" or "McQueen, turn left." Object-oriented DB stores this object as-is, preserving its complex structure and relationships with fellow cars.
The flip side of the medal is that this type of database is rigid. If you want to add a new property type (i.e., “weight”) to a class of objects, you must update the entire database.
The OODBs' history began in 1985, but they are nearly out of business by now—both relational and non-relational databases have surpassed them in popularity.
Advantages:
- Easier to mock-up advanced real-world problems
- Reusability of data models: You can create a new class using the properties and methods of the existing one instead of reinventing the wheel.
Limitations:
- Lack of widely accepted standards
- Slower for simple queries compared to relational databases
Best for CRMs, computer-aided design (CAD) and engineering apps, navigation software, air traffic control systems, game design, and other products where data structures are complex and deeply interconnected.
The best known DBMSs for object-oriented databases include db4o, ObjectDB, and InterSystems Caché.
Hierarchical database
Hierarchical database: an extended family.
IBM developed the first hierarchical database in 1968, so it’s the oldest data model. It organizes data in a tree-like structure, featuring nodes connected by branches. While each node has only one parent, a parent can have multiple child nodes, creating a clear, branching hierarchy. In the coder's language, this type of DB implements 1:1 (parent-child) and 1:n (parent-multiple children) relations.
A File Explorer (formerly Windows Explorer) is an excellent example of a hierarchical data model.
Advantages:
- Very simple
- Ensures data consistency
Limitations:
- Highly rigid
- Vendor lock-in (the only vendor is IBM)
Still used for applications like inventory management systems or financial record-keeping, where structured and organized data is crucial.
The database management system for hierarchical databases is IBM Information Management System (IMS).
Network databases
Network database: each child can have many parents; isn’t it fair?
The network database uses a flexible graph-like structure where each record can have multiple parent and child records. This model allows for many-to-many relationships (a child can have many parents).
In 1969, Charles Bachman introduced the network model to enhance the hierarchical database model. In this context, "network" does not relate to computer networking but refers to the relationships between various data entities.
For example, in a supply chain system, a product might be provided by multiple vendors and used in various manufacturing processes. A network database can model these relationships, ensuring quick access to interconnected data.
Advantages:
- Design and management flexibility compared to hierarchical DBs
Limitations:
- Vendor lock-in: Broadcom is the owner of the Integrated Database Management System (IDMS)
- No standardized query language requiring custom, database-specific methods to manipulate data
- Often require manual traversal of the network structure
Still used for telecommunication systems.
Popular database management systems for network databases include the Integrated Database Management System (IDMS) and RaimaDB (which combines the network and relational model).
Databases designed for specific purposes
Databases designed for specific purposes are tailored to handle specialized use cases efficiently. For example, time-series databases excel at managing and querying data collected over time, such as sensor readings or financial market data.
Here, we’ll review three main types.
OLTP vs. OLAP optimized databases
OLTP (Online Transaction Processing) focuses on managing day-to-day CRUD operations and handling high transaction volumes, like processing individual sales transactions at a retail store. OLTP systems organize data into 2-dimensional tables using a relational database model. This architecture prioritizes data-writing operations to handle large volumes of record updates. OLTP databases ensure fast, reliable transactions and maintain real-time data consistency.
OLAP (Online Analytical Processing) is designed for complex queries and data analysis, enabling users to generate reports and insights from large datasets. For example, a business intelligence tool that analyzes sales trends across different regions is an OLAP application. The architecture of OLAP databases strongly focuses on optimizing data reading operations rather than writing.
OLAP systems function as a central data hub, gathering information from various data warehouses, relational databases, and other sources.
To learn more about cloud-based warehousing and analytics, read our article on Snowflake. This platform provides fast, flexible, and easy-to-use data storage, processing, and analysis options.
When used as a warehouse, an OLAP database has two common schemas.
Star schema vs. snowflake schema.
The star schema consists of a central fact table for quantitative data (like sales amounts, quantities sold, and revenue), surrounded by dimension tables for descriptive data (like product details: product name, category, and price; or store locations: city, region, and country). The fact table connects to dimension tables via foreign keys, creating a star-like shape that simplifies queries and reporting.
The snowflake schema extends the star schema by splitting dimension tables into multiple subdimensions, reducing redundancy. For example, if many products in the product dimension table fall under the same category, the category name will be repeated across multiple rows. By splitting the product dimension table into a “Product” table (with product ID and name) and a “Category” table (with category ID and category name), we avoid duplications. This more sophisticated structure, resembling a snowflake, saves storage space and improves data integrity. However, it can complicate and slow down queries since you'd need to join multiple tables to get the same information.
Here, you can read an article about the traditional data warehouse approaches, like star or snowflake schemas, compared with data vault architecture.
Popular databases optimized for OLTP: PostgreSQL, MySQL (both open-source), and Oracle.
Popular databases optimized for OLAP: Apache Pinot, DuckDB, and ClickHouse (all three are open-source).
However, some database management systems can handle both OLTP and OLAP, such as relational Umbra Database and PostgreSQL (which has OLAP extensions called Hydra and Citus). SAP HANA and ClickHouse, primarily columnar and OLAP optimized, also support mixed workloads.
Time-series databases
Time-series databases are optimized to handle and store time-stamped or time-ordered data. Unlike traditional relational repositories designed for general-purpose data storage, time-series DBs focus on efficiently managing data where time is a critical element.
Each data point in the time-series DB is associated with a timestamp, making it easy to track changes over time. For example, in a monitoring system, the database would record the CPU usage of a server every minute, allowing you to analyze trends or detect anomalies. The database is designed to handle high write loads and query efficiency over large datasets, which is critical when dealing with rapidly growing time-series data.
Time-series databases also offer features like downsampling: As data gets older, the DB can store it at a lower detail level. Instead of keeping every minute of data, it might save just the hourly averages. This helps manage storage and keeps important trends visible. Data retention policies automatically delete older data that is no longer needed.
Best for system performance monitoring, financial market analysis, IoT sensor data collection, and tracking user activity logs.
Read our article on time series forecasting and analysis to learn about more use cases.
Popular DBMS for time-series databases include InfluxDB, Prometheus, and TimescaleDB (all open-source).
Columnar vs. row-based databases
Some databases can be classified as either columnar or row-based. Think of it as the difference between reading a book in traditional Japanese vertical writing or English.
Row-based vs. columnar DB: reading from left to right or top to bottom.
Classical relational databases are row-based. However, some modern cloud-based relational DBMSs support columnar storage. NoSQL wide-column DB is sometimes considered a subtype of columnar databases: It stores data in rows, but only within column families.
Other database models have unique ways of storing information and don’t fall into these two categories.
Row-based databases
Row-based databases store data in rows, each containing a complete record. This means that the data from the first column of a new row is stored directly after the data from the last column of the previous row.
Let’s say we have a table called “Customers” with columns like “Name,” “Age,” “Address,” and “Phone.”
A table example.
In a row-based DB, the data is stored, read, updated, or deleted in the following order: “John Doe, 30, 123 Elm St, Springfield, 555-1234 / Jane Smith, 25, 456 Oak St, Metropolis, 555-5678 / Alice Johnson, 35, 789 Pine St, Gotham, 555-9876.”
This traditional relational database structure is optimal for transactional operations where entire records are frequently accessed and modified.
Advantages:
- Fast for CRUD operations
- Handles diverse, complex queries well
- ACID compliance
- Mature ecosystem: extensive tools, resources, and community support
Limitations:
- Slow data aggregation (since all data should be loaded before finding and pulling out relevant pieces)
- Insufficient compression
- Needs a lot of storage space
Best for OLTP (Online Transaction Processing) systems, such as financial systems, eCommerce platforms, and other applications that require fast write operations and quick retrieval of complete records.
Popular DBMSs for row-based databases include MySQL, PostgreSQL, Oracle, and IBM Db2. In the non-relational (NoSQL) database world, some databases like Apache Cassandra and Amazon DynamoDB can work with a row-based structure, although they primarily are wide-column stores.
Columnar DB
Columnar, or column-based databases, store and process data in columns rather than rows. The data in the “Customer” table mentioned above would be stored in the following order: “John Doe, Jane Smith, Alice Johnson / 30, 25, 35 / 123 Elm St, 456 Oak St, 789 Pine St / Springfield, Metropolis, Gotham / 555-1234, 555-5678, 555-9876.” Each column is stored independently on a disk.
This structure is particularly efficient for analytical queries that scan large datasets for specific attributes, as it minimizes the data read from disk.
For example, in a sales database, a query calculating the total revenue might only need to access the "price" and "quantity" columns. A columnar database can retrieve just these columns, skipping over unrelated data, which speeds up the query significantly. It efficiently handles large-scale data, making it suitable for big data analytics.
Advantages:
- Easy data compression, reducing storage costs
- Scalable, handles growing data volumes
- Efficient bulk inserts and updates
- Flexibility
- Fast data aggregation
Limitations:
- Data modification is slower than in row-based databases
- Less effective for real-time data processing
Best for online analytical processing (OLAP) applications where read-heavy operations, such as aggregations and filtering, dominate. Examples of OLAP apps are business reporting for sales, marketing, management, business process management (BPM), budgeting, and forecasting.
Popular DBMS for columnar databases include Monet DB, Apache Cassandra (both open-source), SAP HANA (multi-model), and ClickHouse DB (open-source).
Centralized vs. distributed databases
This classification refers to the architecture rather than specific technology. Any database managing system can be used for a centralized or distributed architecture, but some DBMSs are better for supporting particular types.
Centralized vs. distributed databases.
Centralized databases
A centralized database resides in one location, such as a host computer. It is maintained and modified only from this location and is usually accessible via an Internet connection such as a LAN or WAN.
Advantages:
- Data security in terms of hacker attacks because all data is stored in one place
- Data is not duplicated
- Maintains integrity
- Cheap because less maintenance and energy are required.
Limitations:
- Traffic is high, efficiency is low
- If something happens to the central server, all data may be lost.
Best for applications where data consistency and control are critical. They are ideal for small to medium-sized organizations with a single location or tightly controlled environments, such as banking products and enterprise resource planning (ERP) systems.
Popular database management systems for centralized databases: Microsoft SQL Server, Oracle DB, PostgreSQL, and MySQL. These DBMSs focus on delivering high performance, security, and management capabilities for databases hosted on a single server or within a centralized infrastructure.
Distributed databases
In distributed databases, data is spread across multiple physical locations or servers, yet thanks to a unified interface, it appears as a single database to users.
Homogeneous distributed database systems use the same data model across all locations, while heterogeneous distributed databases allow for different data models.
Advantages:
- Easy to scale
- Data remains available even if one location fails.
Limitations:
- Data can be duplicated
- Maintaining data integrity and security becomes more challenging.
Best for global eCommerce platforms, real-time analytics, multi-national corporations needing consistent data across various geographic regions, and cloud services where data must be accessible from multiple locations simultaneously.
Popular database management systems for distributed databases: Apache Cassandra, Amazon DynamoDB, CockroachDB, and MongoDB. They are built to handle data replication, fault tolerance, and horizontal scaling across multiple nodes and geographic locations.