Nikhil Kumawat
- Mar 10, 2023
- 5 min read

Which data model to use (relational, document, or graph)?

Relational Data Model

The best-known data model today is probably that of SQL, based on the relational model proposed by Edgar Codd in 1970: data is organized into relations (called tables in SQL), where each relation is an unordered collection of tuples (rows in SQL). The use case for the relational model is transactional processing (entering sales or banking transactions, airline reservations, and stock-keeping in the warehouse) and batch processing (customer invoicing, payroll, and reposting).

The relationships between tables are established by defining the primary and foreign keys. The primary key is a unique identifier for each row in a table, while the foreign key is a field in one table that links to the primary key in another table. This allows related information to be connected across multiple tables and enables complex queries to be performed that retrieve data from multiple tables at once.

Situations where the relational data model is a good fit:

Structured Data: The relational model is ideal for structured data, where the data can be organized into tables with defined relationships between them.
Consistency and integrity: The relational data model ensures that data is consistent and accurate, which is important in applications where data integrity is critical. It provides mechanisms for enforcing constraints on the data.
Ad hoc queries: The relational data model supports ad hoc queries, allowing users to retrieve data based on complex search criteria.
Transactions: The relational data model provides transaction support, allowing multiple users to access and modify data simultaneously. It ensures that data remain consistent even when multiple users are accessing and modifying it at the same time.

The Birth of NoSQL

There are several driving forces behind the adoption of NoSQL databases, including:

A need for greater scalability than relational databases can easily achieve, including vast datasets or very high write throughput.
A widespread preference for free and open-source software over commercial database products.
Specialized query operations that the relational model does not well support.
Frustration with the restrictiveness of relational schemas, and a desire for a more dynamic and expressive data model.

Different applications have different requirements, and the best choice of technology for one use case may differ from the best choice for another.

Relational versus Document Databases Today

The main arguments in favor of the document data model are schema flexibility, better performance due to locality, and that for some applications it is closer to the data structures used by the application. The relational model counters by providing better support for joins, many-to-one, and many-to-many relationships.

When to go with Document Databases

1. If there exists less relationship in Data:

If your application does use many-to-many or many-to-one relationships, the document model becomes less appealing. It's possible to reduce the need for joins by denormalizing, but then the application code needs to do additional work to keep the denormalized data consistent. Joins can be emulated in application code by making multiple requests to the database, but that also moves complexity into the application and is usually slower than a join performed by specialized code inside the database. In such cases, using a document model can lead to significantly more complex application code and worse performance.

2. When a schema needs to be flexible:

No schema means that arbitrary keys and values can be added to a document, and when reading, clients have no guarantees as to what fields the document may contain.

Document databases are schema-on-read (the structure of data is implicit, and only interpret when data is read), in contrast with schema-on-write (the traditional approach of relational databases, where the schema is explicit and the database ensures all written data conforms to it).

The difference between the approaches (schema-on-read and schema-on-write) is particularly noticeable in situations where an application wants to change the format of its data. For example, say you are currently storing each user's full name in one field, and you instead want to store the first name and last name separately. In the document database, you would just start writing new documents with the new fields and have code in the application that handles the case when old documents are read. For example:

On the other hand, in "schema-on-write" database schema, you would typically perform a migration along the lines of:

Schema changes have a bad reputation for being slow and requiring downtime.

Running the UPDATE statement on a large table is likely to be slow on any database since every row needs to be rewritten. If that is not acceptable, the application can leave the first_name set to its default of NULL and fill it in read time, like it would with a document database.

The schema-on-read approach is advantageous if the items in the collection don't all have the same structure for some reason (i.e, the data is heterogeneous) - for example, because:

There are many different kinds of objects, and it is not practical to put each type of object in its own table.
The structure of the data is determined by the external systems over which you have no control and which may change at any time.

In situations like these, a schema may hurt more than it helps, and schemaless documents can be a much more data model.

3. Data locality for queries:

A document is usually stored as a single continuous string, encoded as JSON, XML, or a binary variant thereof (such as MongoDB's BSON). If your application needs to access the entire document (for example, to render it on a web page), there is a performance advantage to this storage locality. If data is split across multiple tables, multiple index lookups are required to retrieve it all, which may require more disk seeks and take more time.

If your application requires flexibility in data structure and needs to handle semi-structured or unstructured data, a document data model may be a better choice.

Note: For highly interconnected data, the document model is awkward, the relational model is acceptable, and graph models are the most natural.

Graph-Like Data Models

If your application has mostly one-to-many relationships (tree-structured data) or no relationships between records, the document model is appropriate. The relational model can handle simple cases of many-to-many relationships, but as the connections with your data become more complex, it becomes more natural to start modeling your data as a graph.

Scenarios where graph data model can be useful:

Social Network: Social networks involve complex relationships between individuals, communities, organizations, and events. A graph model can represent these relationships effectively and can be used for applications like recommendation engines, personalized marketing, and social media analytics.
Logistics and Supply Chain Management: Logistics and supply chain management involve tracking the flow of goods and information across multiple stakeholders and locations. A graph model can help model these complex relationships and provide better visibility and control over the entire supply chain.
Knowledge Management: Graph models can be used to represent knowledge in a more natural and intuitive way. Concepts can be represented as nodes, and the relationships between concepts can be represented as edges. This can help develop smarter search engines, recommendation systems, and intelligent assistants.

Summary

Historically, data started out being represented as one big tree (the hierarchical model), but that wasn't good for representing many-to-many relationships, so the relational model was invented to solve that problem. Some applications don't fit well in the relational model either. New non-relational "NoSQL" datastores have diverged in two main directions:

Document databases target use cases where data comes in self-contained documents and relationships between one document and another are rare.
Graph databases go in the opposite direction, targeting use cases where anything is potentially related to everything.

All three models (document, relational, and graph) are widely used today, and each is good in its respective domain. One model can be emulated in terms of another model - for example, graph data can be represented in a relational database - but the result is often awkward. That's why we have different systems for different purposes, not a single one-size-fits-all solution.

Hope the blog helps you to understand when to use which data model.

Thanks.