Data Structures

Introduction
Computer Memory and Data Representation
Basic Data Types and Abstract Data Types
Arrays
Records
Linked Lists
List (as an ADT)
Stacks and Queues
Trees
Graphs
Files

Introduction

A program acts on data . The data are values that various variable aspects of a problem may assume.

Example 1 : For a program that can compute the average of n numbers, the actual numbers whose average we would like the program to compute in a particular run, as well as the value of n comprise the data. (If we consider the actual code of the program we are likely to find some more pieces of data.)

The code of a program is based on certain assumptions about the data that is handled at different points during the execution of the program. These assumptions are mainly regarding -

the possible values that each data item may assume
the operations that may be legally performed on the different data items
the way the values are represented in the memory of the computer
the relationship between different data items

The first two points are generally accomodated in the notion of data types . The third, and to some extent, the fourth points are accomodated in the data structures . The fourth point, however, may often be subject to the logic chosen in the program. In the simple example cited above (Example 1), the data type of n is definitely - a natural number, and that of the numbers whose average is to be computed is real number. These data types define the values that are valid for these data items as well as the operations that are legal for these data items, viz., addition, subtraction, multiplication, division, exponentiation, numeric comparisions, etc. We shall shortly discuss the issue of representation of data inside the computer and that of relationship between different data items.

Example 2 : Program to sort n names alphabetically.

In Example 2, we have names as data items. Each name is a string of alphabetic characters (letters). The valid values of an alphabetic letter string and the operations on such strings are different from that of numeric data. So we may consider alphabetic letter string as another data type.

In example 1 it is clear that the given numbers have the relationship that is their similarity of purpose, and it is distinct from the purpose of n. Most likely the program will add the numbers one by one to obtain their sum. This inherent relationship between the numbers leads us to represent them in an array, as we shall see later. The same is true for the list of names in example 2.

Back Top

Computer Memory and Data Representation

The memory of a computer is a sequence of bytes each of which can be individually accessed using addresses. The addresses are nothing but numbers denoting the position of each byte in the sequence. Many programming languages treat addresses as valid data type. Most computers also support accessing 2 or 4 consecutive bytes as a whole using the address of one of the bytes (some even support accessing 8 bytes together). On the other hand, a computer can normally perform basic operations using operands of upto a certain size, say, 2, 4 or 8 bytes. This size is normally referred to as the machine word size. Thus computers are often classified as 16-bit, 32-bit or 64-bit computers.

The memory bytes of a computer can basically hold numbers in binary form. But as we have seen, we need to store other types of values, such as alphabetic letters and strings of such letters. Even certain numeric types require more than the basic accessible units. For this reason certain mapping is used to convert from the users' data values to the numeric format supported by the computers' hardware. For example, for the letters of the Roman alphabet there is the ASCII encoding scheme wherein the letter A is represented internally as 65, B as 66, and so on. Thus each letter of the Roman alphabet is represented in a single byte. For other data types more bytes may be used to represent a single value. For example, to represent a floating point number (say) 8 consecutive bytes may be used, and different parts may have different meanings (characteristic and mantissa).

Back Top

Basic Data Types and Abstract Data Types

Each programming language directly implements certain data types, such as integers, real numbers, alphabetic character, boolean, etc. These data types are referred to as basic data types. But when we try to develop solutions to our real world problems we encounter situations involving other types of data. For instance, in example 2 we need to handle names which are strings of alphabetic characters. From the computer's point of view, or the programming language's point of view, such data types correspond to some kind of abstraction due to the chosen way of developing a solution to a real world problem. Hence these are called abstract data types (ADT).

Example 3 : There are the names of n students and the respective marks that each has obtained in a test.

In Example 3, suppose we need a program that will tell the name of the student who has scored the highest mark in the test. The data types that are required are alphabetic letter string for the names and numeric for the marks. But in this case we need a way to associate each name with the corresponding mark. In other words we are dealing with an abstraction called tuple that is the pair (name, mark) for each student.

Some of the frequently encountered abstract data types are - tuples (also called records), lists, queues, circular queues, stacks, hierarchies (trees), tables, sets, files, graphs, etc.

Most programming languages provide some mechanisms for the user to implement such ADT using the basic data types. In data structures we consider the various possibilities of representing the ADTs using the different options supported by the programming languages. While it may be possible to implement an ADT in more than one way, the choice of implementation affects the convenience of the software developer, efficiency of the program and economy of memory utilization. To select the most suitable implementation of an ADT for a particular program, it is necessary to keep in mind the range of values required to be supported for each data item and the set of operations that are to be performed using the ADTs.

The basic elements of the mechanisms in programming languages for implementing the ADTs are array , record and use of addresses .

Back Top

Arrays

In a program an array refers to multiple data items of the same data type placed together in contiguous memory area. Each data item in an array can be identified by its position in the sequence. The data items of an array can be of any data type - basic or abstract.

Array is one of the basic data structures which can be used to implement several different ADTs.

Accessing the individual data items of an array - Suppose, each data item is of size s bytes and the memory area designated for the array starts at address A. Then the i th data item of the array is located at address A + (i - 1) * s. In general the objects cannot be distinguished in the memory by their contents.

The individual data items in an array must be of the same size, or else the individual data items cannot be accessed using the above expression. This requirement raises certain issues when the individual data items are likely to be variable sizes, eg., names of persons.

Back Top

Records (or tuples)

A record or a tuple means an ordered set of data items pertaining to a single object. For instance in example 3, for each student we can think of a record containing 2 named parts or fields - one for the name of a student and another for his or her marks. The field names are used to refer to each individual data item inside a record.

Memory structure for a record - All the fields of a record are stored one after another in a contiguous memory area. The distance (offset) of a field from the beginning of the record is the former's characteristic, and the field name is actually mapped to this value. Thus if the starting address in memory of a record is known, each field can be accessed by adding the corresponding offset to it. The size of a record is the sum of the sizes of the individual fields. Note that a field can, in turn, be a record of some other type.

Pointer fields in records - A record can also have one or more "pointer", or address fields. These fields are used to locate other data items in the memory.

Like array, record is another basic data structures which can be used to implement several different ADTs.

Back Top

Linked Lists

A linked list is a data structure comprising multiple nodes, each node being of type record with a pointer to another node so that if one of the nodes is designated as the head of the linked list then all the other nodes can be accessed by following the pointers in the node.

Unlike an array, the individual nodes of a linked list can be scattered at different places in the computer's memory. The notion of offset of a node from the beginning of the list does not hold here. The only way to get to a node is using its address. Generally a program does not maintain the address of the individual nodes outside the list. Hence the nodes can be accessed only by traversing the list from the head. This is a time consuming process compared to accessing the individual data items in an array. On the other hand, considering the economy of memory space, linked lists are suitable in situations when the maximum number of data items that may occur can vary greatly and cannot be determined in advance. However, for small number of items, the pointer fields may be an overhead.

Doubly linked list

Back Top