Abstract data types

What is an abstract data type?

An abstract data type (ADT) is an object with a generic description independent of implementation details. This description includes a specification of the components from which the object is made and also the behavioral details of the object. Instances of abstract objects include mathematical objects (like numbers, polynomials, integrals, vectors), physical objects (like pulleys, floating bodies, missiles), animate objects (dogs, Pterodactyls, Indians) and objects (like poverty, honesty, inflation) that are abstract even in the natural language sense. You do not see C in Pterodactyls. Only when you want to simulate a flying Pterodactyl, you would think of using a graphics package in tandem with a computer language. Similarly, inflation is an abstract concept. When you want to model it and want to predict it for the next 10 years, you would think of writing an extrapolation program in C.
Specifying only the components of an object does not suffice. Depending on the problem you are going to solve, you should also identify the properties and behaviors of the object and perhaps additionally the pattern of interaction of the object with other objects of same and/or different types. Thus in order to define an ADT we need to specify:
  • The components of an object of the ADT.
  • A set of procedures that provide the behavioral description of objects belonging to the ADT.
There may be thousands of ways in which a given ADT can be implemented, even when the coding language remains constant. Any such implementation must comply with the content-wise and behavioral description of the ADT.
Examples
  • Integers: An integer is an abstract data type having the standard mathematical meaning. In order that integers may be useful, we also need to specify operations (arithmetic operations, gcd, square root etc.) and relations (ordering, congruence etc.) on integers.
  • Real numbers: There are mathematically rigorous ways of defining real numbers (Dedekind cuts, completion of rational numbers, etc). To avoid these mathematical details, let us plan to represent real numbers by decimal expansions (not necessarily terminating). Real numbers satisfy standard arithmetic and other operations and the usual ordering.
  • Complex numbers: A complex number may be mathematically treated as an ordered pair of real numbers. An understanding of real numbers is then sufficient to represent complex numbers. However, the complex arithmetic is markedly different from the real arithmetic.
  • Polynomials with real (or complex or integer or rational) coefficients with the standard arithmetic.
  • Matrices with real (or complex or integer or rational) entries with the standard matrix arithmetic (which may include dimension, rank, nullity, etc).
  • Sets are unordered collections of elements. We may restrict our study to sets of real (or complex) numbers and talk about union, intersection, complement and other standard operations on sets.
  • multiset is an unordered collection of elements (say, numbers), where each element is allowed to have multiple occurrences. For example, an aquarium is a multiset of fish types. One can add or delete fishes to or from an aquarium.
  • book is an ADT with attributes like name, author(s), ISBN, number of pages, subject, etc. You may think of relations like comparison of difficulty levels of two books.

How to implement an abstract data type?

It is now and only now when you think about writing C codes. Carefully investigate the specification of the ADT and possible target applications where this ADT is going to be used. Plan for suitable C constructs to provide the appropriate functionality with good performance. Try to exploit your experience with C. But fully understand what you are going to implement, the limitations, the expected performance figures, the ease of code maintenance, and a lot of related issues. After all, you have to market your product.
Examples
  • Integers: Oh, my! C provides so many integer variables and still I have to write my integers. Yep! You may have to. For most common-place applications C's built-in integer data types are sufficient. But not always. Suppose my target application is designing a cryptosystem, where one deals with very big integers, like those of bit-sizes one to several thousand bits. Our C's maximum integer length is 64 bits. That is grossly inadequate to address the cryptosystem designer's problem. ANSI standards dictate use of integers of length at most 32 bits, which are even poorer for cryptography, but at the minimum portable across platforms. At any rate, you need your customized integer data types.A common strategy is to break big integers into pieces and store each piece in a built-in data type. To an inexperienced user breaking with respect to the decimal representation seems easy and intuitive. But computer's world is binary. So breaking with respect to the binary representation is much more efficient in terms of space and running time. So we plan to use an array of unsigned long variables to store the bits of a big integer. Each such variable is a 32-bit word and is capable of storing 32 bits of a big integer. Therefore, if we plan to work with integers of size no larger than 10,000 bits, we require an array of size no more than 313 unsigned long variables. The zeroth location of the array holds the least significant 32 bits of a big integer, the first location the next 32 bits, and so on. Since all integers are not necessarily of size 10,000 bits, it is also necessary to store the actual word-size of a big integer. Finally, if we also plan to allow negative integers, we should also reserve a location for storing the sign information. So here is a possible implementation of the big integer data type.
       typedef struct {
          unsigned long words[313];
          unsigned int wordSize;
          unsigned char sign;
       } bigint;
    
    This sounds okay, but has an efficiency problem. When you pass a bigint data to a function, the entire words array is copied element-by-element. That leads to unreasonable overheads during parameter passing. We can instead use an array of 315 unsigned long variables and use its 313-th and 314-th locations to store the size and sign information. The first 313 locations (at indexes 0 through 312) represent the magnitude of the integer as before.
       #define SIZEIDX 313
       #define SIGNIDX 314
       typedef unsigned long goodbigint[315];
    
    Now goodbigint is a simple array and so passing it to a function means only a pointer is passed. Quite efficient, right?
    These big integers are big enough for cryptographic applications, but cannot represent integers bigger than big, for example, integers of bit-size millions to billions. Whenever we use static arrays, we have to put an upper limit on the size. If we have to deal with integers of arbitrary sizes (as long as memory permits), we have no option other than using dynamic memory and allocate the exact amount of memory needed to store a very big integer. But then since the maximum index of the dynamic array is not fixed, we have to store the size and sign information at the beginning of the array. Thus the magnitude of the very big integer is stored starting from the second array index. This leads to somewhat clumsy translation between word indices and array indices.
       #define SIZEIDX 0
       #define SIGNIDX 1
       typedef unsigned long *verybigint;
    
    A better strategy is to use a structure with a dynamic words pointer.
       typedef struct {
          unsigned long *words;
          unsigned int size;
          unsigned char sign;
       } goodverybigint;
    
    So you have to pay a hell lot of attention, when implementation issues come. Good solutions come from experience and innovativeness.
    Being able to define integers for a variety of applications is not enough. We need to do arithmetic (add, subtract, multiply etc.) on these integers. It is beyond the scope of this elementary course to go into the details of these arithmetic routines. It suffices here only to highlight the difference between abstract specifications and application-specific implementations. Both are important.
  • Real numbers: Again C provides built-in implementations of real numbers: floatdouble and long double. If one has to use floating point numbers of higher precision, one has to go for private floating point data types and write arithmetic routines for these new data types. These are again topics too advanced for this course.
  • Complex numbers: If we are happy with real numbers of double precision, the most natural way to define a complex number is the following:
       typedef struct {
          double real;
          double imag;
       } complex;
    
    Let us also illustrate the implementation of some arithmetic routines on complex numbers:
       complex cadd ( complex z1 , complex z2 )
       {
          complex z;
          z.real = z1.real + z2.real;
          z.imag = z1.imag + z2.imag;
          return z;
       }
    
       complex cmul ( complex z1 , comple z2 )
       {
          complex z;
          z.real = z1.real * z2.real - z1.imag * z2.imag;
          z.imag = z1.real * z2.imag + z1.imag * z2.real;
          return z;
       }
    
       complex conj ( complex z1 )
       {
          complex z;
          z.real = z1.real;
          z.imag = -z1.imag;
          return z;
       }
    
       void cprn ( complex z )
       {
          printf("(%lf) + i(%lf)", z.real, z.imag);
       }
    
  • Matrices: Suppose we want to work with matrices having complex entries and suppose that the complex ADT has been defined as above. We may define matrices of bounded sizes as:
       #define MAXROW 10
       #define MAXCOL 15
       typedef struct {
          int rowdim;
          int coldim;
          complex entry[MAXROW][MAXCOL];
       } matrix;
    
    Let us now implement some basic arithmetic operations on these matrices.
       matrix msetid ( int n )
       {
          matrix C;
          int i, j;
    
          if ((n > MAXROW) || (n > MAXCOL)) {
             fprintf(stderr, "msetid: Matrix too big\n");
             C.rowdim = C.coldim = 0;
             return C;
          }
          C.rowdim = C.coldim = n;
          for (i = 0; i < C.rowdim; ++i) {
             for (j = 0; j < C.coldim; ++j) {
                A.entry[i][j].real = (i == j) ? 1 : 0;
                A.entry[i][j].imag = 0;
             }
          }
          return C;
       }
    
       matrix madd ( matrix A , matrix B )
       {
          matrix C;
          int i, j;
    
          if ((A.rowdim != B.rowdim) || (A.coldim != B.coldim)) {
             fprintf(stderr, "madd: Matrices of incompatible dimensions\n");
             C.rowdim = C.coldim = 0;
             return C;
          }
    
          C.rowdim = A.rowdim;
          C.coldim = A.coldim;
          for (i = 0; i < C.rowdim; ++i)
             for (j = 0; j < C.coldim; ++j)
                C.entry[i][j] = cadd(A.entry[i][j],B.entry[i][j]);
          return C;
       }
    
       matrix mmul ( matrix A , matrix B )
       {
          matrix C;
          int i, j, k;
          complex z;
    
          if (A.coldim != B.rowdim) {
             fprintf(stderr, "mmul: Matrices of incompatible dimensions\n");
             C.rowdim = C.coldim = 0;
             return C;
          }
          C.rowdim = A.rowdim;
          C.coldim = B.coldim;
          for (i = 0; i < A.rowdim; ++i) {
             for (j = 0; j < B.coldim; ++j) {
                C.entry[i][j].real = 0;
                C.entry[i][j].imag = 0;
                for (k = 0; k < A.coldim; ++k) {
                   z = cmul(A.entry[i][k], B.entry[k][j]);
                   C.entry[i][j] = cadd(C.entry[i][j],z);
                }
             }
          }
          return C;
       }
    

0 comments:

Post a Comment

My Instagram