Nov
25
2017

Session IV : Data Types I

Session IV : Data Types I

In this session IV : Data Types I , there are 13 sub topics :

  • Introduction
  • Primitive Data Types
  • Character String Types
  • User-Defined Ordinal Types
  • Array Types
  • Associative Arrays
  • Record Types
  • Tuple Types
  • List Types
  • Union Types
  • Pointer and Reference Types
  • Type Checking
  • Strong Typing

Introduction

A data type defines a collection of data values and a set of predefined operations on those values. Computer programs produce results by manipulating data. An important factor in determining the ease with which they can perform this task is how well the data types available in the language being used match the objects in the real-world of the problem being addressed. Therefore, it is crucial that a language supports an appropriate collection of data types and structures.

  • A data type defines a collection of data objects and a set of predefined operations on those objects
  • A descriptor is the collection of the attributes of a variable
  • An object represents an instance of a user-defined (abstract data) type

Primitive Data Types

Data types that are not defined in terms of other types are called primitive data types. Nearly all programming languages provide a set of primitive data types. Some of the primitive types are merely reflections of the hardware—for example, most integer types. Others require only a little nonhardware support for their implementation. To provide the structured types, the primitive data types of a language are used, along with one or more type constructors.

  • Integer
  • Floating Point
  • Complex
  • Decimal
  • Boolean
  • Character

Character String Types

A character string type is one in which the values consist of sequences of characters. Character string constants are used to label output, and the input and output of all kinds of data are often done in terms of strings. Of course, character strings also are an essential type for all programs that do character manipulation.

The most common string operations are assignment, catenation, substring reference, comparison, and pattern matching. A substring reference is a reference to a substring of a given string. Substring references are discussed in the more general context of arrays, where the substring references are called slices. In general, both assignment and comparison operations on character strings are complicated by the possibility of string operands of different lengths.

There are several design choices regarding the length of string values. First, the length can be static and set when the string is created. Such a string is called a static length string. This is the choice for the strings of Python, the immutable objects of Java’s String class, as well as similar classes in the C++ standard class library, Ruby’s built-in String class, and the .NET class library available to C# and F#.

User-Defined Ordinal Types

An ordinal type is one in which the range of possible values can be easily associated with the set of positive integers. In Java, for example, the primitive ordinal types are integer, char, and boolean. There are two user-defined ordinal types that have been supported by programming languages: enumeration and subrange.

An enumeration type is one in which all of the possible values, which are named constants, are provided, or enumerated, in the definition. Enumeration types provide a way of defining and grouping collections of named constants, which are called enumeration constants. In languages that do not have enumeration types, programmers usually simulate them with integer values. Enumeration types can provide advantages in both readability and reliability. Readability is enhanced very directly: Named values are easily recognized, whereas coded values are not.

A subrange type is a contiguous subsequence of an ordinal type. For example, 12..14 is a subrange of integer type. Subrange types were introduced by Pascal and are included in Ada. There are no design issues that are specific to subrange types. Subrange types enhance readability by making it clear to readers that variables of subtypes can store only certain ranges of values. Reliability is increased with subrange types, because assigning a value to a subrange variable that is outside the specified range is detected as an error, either by the compiler (in the case of the assigned value being a literal value) or by the run-time system (in the case of a variable or expression). It is odd that no contemporary language except Ada has subrange types.

Array Types

An array is a homogeneous aggregate of data elements in which an individual element is identified by its position in the aggregate, relative to the first element. The individual data elements of an array are of the same type. References to individual array elements are specified using subscript expressions.

Specific elements of an array are referenced by means of a two-level syntactic mechanism, where the first part is the aggregate name, and the second part is a possibly dynamic selector consisting of one or more items known as subscripts or indices. If all of the subscripts in a reference are constants, the selector is static; otherwise, it is dynamic. The selection operation can be thought of as a mapping from the array name and the set of subscript values to an element in the aggregate. Indeed, arrays are sometimes called finite mappings.

The binding of the subscript type to an array variable is usually static, but the subscript value ranges are sometimes dynamically bound.

Subscript Binding and Array Categories

  • Static : one in which the subscript ranges are statically bound and storage allocation is static (done before run time).
  • Fixed stack-dynamic : one in which the subscript ranges are statically bound, but the allocation is done at declaration elaboration time during execution.
  • Stack-dynamic : one in which both the subscript ranges and the storage allocation are dynamically bound at elaboration time.
  • Fixed heap-dynamic : similar to a fixed stack-dynamic array, in that the subscript ranges and the storage binding are both fixed after storage is allocated.
  • Heap-dynamic : one in which the binding of subscript ranges and storage allocation is dynamic and can change any number of times during the array’s lifetime.

Associative Arrays

An associative array is an unordered collection of data elements that are indexed by an equal number of values called keys. In the case of non-associative arrays, the indices never need to be stored (because of their regularity). In an associative array, however, the user-defined keys must be stored in the structure. So each element of an associative array is in fact a pair of entities, a key and a value.

In Perl, associative arrays are called hashes, because in the implementation their elements are stored and retrieved with hash functions. The namespace for Perl hashes is distinct: Every hash variable name must begin with a percent sign (%). Each hash element consists of two parts: a key, which is a string, and a value, which is a scalar (number, string, or reference). The implementation of Perl’s associative arrays is optimized for fast lookups, but it also provides relatively fast reorganization when array growth requires it.

Record Types

A record is an aggregate of data elements in which the individual elements are identified by names and accessed through offsets from the beginning of the structure. There is frequently a need in programs to model a collection of data in which the individual elements are not of the same type or size. For example, information about a college student might include name, student number, grade point average, and so forth. A data type for such a collection might use a character string for the name, an integer for the student number, a floatingpoint for the grade point average, and so forth. Records are designed for this kind of need.

The fundamental difference between a record and an array is that record elements, or fields, are not referenced by indices. Instead, the fields are named with identifiers, and references to the fields are made using these identifiers. Another difference between arrays and records is that records in some languages are allowed to include unions.

Records are frequently valuable data types in programming languages. The design of record types is straightforward, and their use is safe. Records and arrays are closely related structural forms, and it is therefore interesting to compare them. Arrays are used when all the data values have the same type and/or are processed in the same way. This processing is easily done when there is a systematic way of sequencing through the structure. Such processing is well supported by using dynamic subscripting as the addressing method.

Tuple Types

A tuple is a data type that is similar to a record, except that the elements are not named. Python includes an immutable tuple type. If a tuple needs to be changed, it can be converted to an array with the list function. After the change, it can be converted back to a tuple with the tuple function. One use of tuples is when an array must be write protected, such as when it is sent as a parameter to an external function and the user does not want the function to be able to modify the parameter. Python’s tuples are closely related to its lists, except that tuples are immutable.

List Types

Lists were first supported in the first functional programming language, LISP. They have always been part of the functional languages, but in recent years they have found their way into some imperative languages.

Union Types

A union is a type whose variables may store different type values at different times during program execution. As an example of the need for a union type, consider a table of constants for a compiler, which is used to store the constants found in a program being compiled. One field of each table entry is for the value of the constant. Suppose that for a particular language being compiled, the types of constants were integer, floating point, and Boolean. In terms of table management, it would be convenient if the same location, a table field, could store a value of any of these three types. Then all constant values could be addressed in the same way. The type of such a location is, in a sense, the union of the three value types it can store. Unions are potentially unsafe constructs in some languages. They are one of the reasons why C and C++ are not strongly typed: These languages do not allow type checking of references to their unions. On the other hand, unions can be safely used, as in their design in Ada, ML, Haskell, and F#.

Pointer and Reference Types

A pointer type is one in which the variables have a range of values that consists of memory addresses and a special value, nil. The value nil is not a valid address and is used to indicate that a pointer cannot currently be used to reference a memory cell. Pointers are designed for two distinct kinds of uses. First, pointers provide some of the power of indirect addressing, which is frequently used in assembly language programming. Second, pointers provide a way to manage dynamic storage. A pointer can be used to access a location in an area where storage is dynamically allocated called a heap. Variables that are dynamically allocated from the heap are called heapdynamic variables. They often do not have identifiers associated with them and thus can be referenced only by pointer or reference type variables. Variables without names are called anonymous variables. It is in this latter application area of pointers that the most important design issues arise. Pointers, unlike arrays and records, are not structured types, although they are defined using a type operator (* in C and C++ and access in Ada). Furthermore, they are also different from scalar variables because they are used to reference some other variable, rather than being used to store data. These two categories of variables are called reference types and value types, respectively.

Type Checking

Type checking is the activity of ensuring that the operands of an operator are of compatible types. A compatible type is one that either is legal for the operator or is allowed under language rules to be implicitly converted by compiler-generated code (or the interpreter) to a legal type. This automatic conversion is called a coercion. For example, if an int variable and a float variable are added in Java, the value of the int variable is coerced to float and a floating-point add is done. A type error is the application of an operator to an operand of an inappropriate type. For example, in the original version of C, if an int value was passed to a function that expected a float value, a type error would occur (because compilers for that language did not check the types of parameters). If all bindings of variables to types are static in a language, then type checking can nearly always be done statically. Dynamic type binding requires type checking at run time, which is called dynamic type checking.

Strong Typing

One of the ideas in language design that became prominent in the so-called structured-programming revolution of the 1970s was strong typing. Strong typing is widely acknowledged as being a highly valuable language characteristic. Unfortunately, it is often loosely defined, and it is often used in computing literature without being defined at all. A programming language is strongly typed if type errors are always detected. This requires that the types of all operands can be determined, either at compile time or at run time. The importance of strong typing lies in its ability to detect all misuses of variables that result in type errors. A strongly typed language also allows the detection, at run time, of uses of the incorrect type values in variables that can store values of more than one type.

The coercion rules of a language have an important effect on the value of type checking. For example, expressions are strongly typed in Java. However, an arithmetic operator with one floating-point operand and one integer operand is legal. The value of the integer operand is coerced to floating-point, and a floating-point operation takes place. This is what is usually intended by the programmer. However, the coercion also results in a loss of one of the benefits of strong typing—error detection.

Written by stevenbudinata in: Uncategorized |

No Comments »

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress. Kredit, Streaming Audio | Theme by TheBuckmaker.