Thinking@Scale Yan Qi     About     Feed

Understand Data Models

Design Data-Intensive Applications - Chapter 2

A data model describes the data with objects and relationships. Document model was used in IBM’s Information Management System (IMS), the popular database for business data processing in the 1970s. It did quite well when the relationship among objects is one-to-many. When more complex relationships like many-to-many are involved, normalization is necessary to eliminate the duplicates and JOIN is required for data combination. However the document model can support neither normalization nor JOIN, in the efficient way. Two models were introduced to address the challenge by many-to-many relationships: the network model and the relational model.

The network model is a natural generalization of hierarchical or tree structure of the document model. In the network model, an object can link to more than one child and be linked from multiple parents; therefore many-to-one or many-to-many relationships are allowed. To access a record or object, an access path must be provided to follow a path from a root record along these chains of links. It is possible for many paths to reach the same object if many-to-many relationships exist in the network. The developer should keep all these in mind while retrieving any records. Thus the complexity of querying and updating the data can be a huge concern. The idea died down after 1980s.

On the other hand, the relational model flattens the data into a set of tables (or relations), each of which has a collection of rows (or tuples). The key and foreign key are introduced as an attribute or field into tables to build the relationships. JOIN is supported to combine relations with the foreign keys in the query execution. Conceptually it still needs the access path to query any records in the database, but implicitly. The query optimizer can automatically choose the right and shortest path to execute the query. User doesn’t have to worry about how the query optimizer works and can focus on the business logic mostly. Surprisingly the relational model thrives decades and until now it dominates the database world.

There is a chapter in the book, Designing Data-Intensive Applications with an overview of the data models and query languages. Here I presented the chapter, giving more details about the above.