A distributed data structure (DDS) is a self-managing storage layer developed to run on a cluster of workstations and handle the demands associated with Internet services, a distributed data structure (DDS) is a self-managing storage layer. A DDS provides all the above characteristics, including tight data consistency, high throughput, high concurrency, availability, and incremental scalability. An ordinary data structure, such as a hash table, tree, or log, is how service authors interpret the interface to a DDS. The DDS platform conceals all the techniques required to access, divide, replicate, scale, and recover data behind this interface. When developing a new service, writers only have to worry about the specific logic because these complicated methods are concealed behind the straightforward DDS interface. The DDS platform handles the challenging problems of managing a persistent state.
Click here to learn more about the comprehensive DSA course right away.
The DDS is accessible to all cluster nodes, and they all see the same accurate representation of the DDS. Any service instance in the cluster can respond to requests from any client as long as services maintain all persistent states in the DDS, while we anticipate clients will have an affinity for specific service instances to enable the accumulation of session states.
Databases and file systems have long managed durable states; having a storage layer to do the same is not new. This can be demonstrated by directly contrasting databases, distributed file systems, and DDSs.
The ACID qualities of an RDBMS, which are obtained from the use of transactions, give extremely strong durability and consistency guarantees [18], but these ACID properties can be expensive in terms of complexity and overhead. To lessen the pressure placed on the RDBMS, Internet services that rely on RDBMS backends frequently take extreme measures [15,21,32]. The high degree of data independence RDBMSs provide a powerful abstraction that increases complexity and performance overhead. The most common RDBMS's numerous levels (such as SQL parsing, query optimization, access path selection, etc.) allow users to separate their data's logical structure from its physical organization. This decoupling enables users to build dynamically and issue queries over the data only constrained by the SQL language's expression capabilities.
However, data independence can generally make parallelization (and hence scaling) challenging. An RDBMS will always prioritize consistency over availability from the standpoint of the service properties. If there are media or processor problems, an RDBMS may become inaccessible until the failure is fixed, which is undesirable for Internet services.
Consistency models in file systems are less formally specified. Some only provide minimum consistency guarantees, whereas others ensure that the filesystem image is consistent across all clients. Locking is often done at the file level. Similarly, there are several levels of scalability for distributed file systems; some don't scale since they rely on centralized file servers. Others, like xFS [3], are entirely serverless and, in theory, may scale to extremely high capacities. File systems are organized as hierarchical directories of files, and files are variable-length arrays of bytes. As a result, file systems provide a rather low-level interface with little data independence. File system clients have direct access to these components (directories and files); Clients are in charge of logically organizing their application data into directories, files, and the bytes contained within those files.
A DDS has a consistency model that is very precisely specified; all actions on its elements are atomic, meaning that they either finish completely or not. Since DDSs have one-copy equivalence, clients only see a single, logical data item even while data elements in a DDS are replicated. All clients see the same representation of a DDS through its interface since two-phase commits are utilized to maintain replica coherence. As we shall demonstrate later, many of our existing protocol design decisions and implementation choices take advantage of the lack of transactional capability for increased speed and simplicity. Transactions spanning multiple elements or activities are not yet supported.
The interface of a DDS is higher level and more structured than a file system. Instead of a random byte range, the granularity of an operation is a whole data structure element. In contrast to an RDBMS, which defines operations by the set of expressible SQL declarations, a DDS's set of operations over the data is fixed by a small set of methods provided by the DDS API. A DDS eliminates the query parsing and optimization phases of an RDBMS; however, the DDS interface is less configurable and provides fewer data independence.
We could design and build a DDS with all the service properties by selecting an abstraction level that lies midway between an RDBMS and a file system, as well as a clear and straightforward consistency model. In our experience, the DDS interfaces are rich enough to successfully develop sophisticated services, even though they are not as generic as SQL.
If you want to learn more about Data structures and its concepts, visit Learnbay’s data structures and algorithms course, taught by industry tech leaders.