Enforcing Strong Typing

Annex C of MISRA C:2012 summarizes the problem succinctly:

"ISO C may be considered to exhibit poor type safety as it permits a wide range of implicit type conversions to take place. These type conversions can compromise safety as their implementation-defined aspects can cause developer confusion."

The most severe consequences come from inappropriate conversions involving pointer types, as they can cause memory safety violations. Two sections of MISRA C are dedicated to these issues: "Pointer type conversions" (9 rules) and "Pointers and arrays" (8 rules).

Inappropriate conversions between scalar types are only slightly less severe, as they may introduce arbitrary violations of the intended functionality. MISRA C has gone to great lengths to improve the situation, by defining a stricter type system on top of the C language. This is described in Appendix D of MISRA C:2012 and in the dedicated section on "The essential type model" (8 rules).

Enforcing Strong Typing for Pointers

Pointers in C provide a low-level view of the addressable memory as a set of integer addresses. To write at address 42, just go through a pointer:

int main() { int *p = 42; *p = 0; return 0; }

Running this program is likely to hit a segmentation fault on an operating system, or to cause havoc in an embedded system, both because address 42 will not be correctly aligned on a 32-bit or 64-bit machine and because this address is unlikely to correspond to valid addressable data for the application. The compiler might issue a helpful warning on the above code (with option -Wint-conversion implied by -Wall in GCC or LLVM), but note that the warning disappears when explicitly converting value 42 to the target pointer type, although the problem is still present.

Beyond their ability to denote memory addresses, pointers are also used in C to pass references as inputs or outputs to function calls, to construct complex data structures with indirection or sharing, and to denote arrays of elements. Pointers are thus at once pervasive, powerful and fragile.

Pointers Are Not Addresses

In an attempt to rule out issues that come from direct addressing of memory with pointers, MISRA C states in Rule 11.4 that "A conversion should not be performed between a pointer to object and an integer type". As this rule is classified as only Advisory, MISRA C completes it with two Required rules:

  • Rule 11.6: "A cast shall not be performed between pointer to void and an arithmetic type"

  • Rule 11.7: "A cast shall not be performed between pointer to object and a non-integer arithmetic type"

In Ada, pointers are not addresses, and addresses are not integers. An opaque standard type System.Address is used for addresses, and conversions to/from integers are provided in a standard package System.Storage_Elements. The previous C code can be written as follows in Ada:

with System; with System.Storage_Elements; procedure Pointer is A : constant System.Address := System.Storage_Elements.To_Address (42); M : aliased Integer with Address => A; P : constant access Integer := M'Access; begin P.all := 0; end Pointer;

The integer value 42 is converted to a memory address A by calling System.Storage_Elements.To_Address, which is then used as the address of integer variable M. The pointer variable P is set to point to M (which is allowed because M is declared as aliased).

Ada requires more verbiage than C:

  • The integer value 42 must be explicitly converted to type Address

  • To get a pointer to a declared variable such as M, the declaration must be marked as aliased

The added syntax helps first in making clear what is happening and, second, in ensuring that a potentially dangerous feature (assigning to a value at a specific machine address) is not used inadvertently.

The above example is legal in SPARK, but the SPARK analysis tool issues warnings as it cannot control how the program or its environment may update the memory cell at address 42.

Pointers Are Not References

Passing parameters by reference is critical for efficient programs, but the absence of references distinct from pointers in C incurs a serious risk. Any parameter of a pointer type can be copied freely to a variable whose lifetime is longer than the object pointed to, a problem known as "dangling pointers". MISRA C forbids such uses in Rule 18.6: "The address of an object with automatic storage shall not be copied to another object that persists after the first object has ceased to exist". Unfortunately, enforcing this rule is difficult, as it is undecidable.

In SPARK, parameters can be passed by reference, but no pointer to the parameter can be stored past the return point of the function, which completely solves this issue. In fact, the decision to pass a parameter by copy or by reference rests in many cases with the compiler, but such compiler dependency has no effect on the functional behavior of a SPARK program. In the example below, the compiler may decide to pass parameter P of procedure Rotate_X either by copy or by reference, but regardless of the choice the postcondition of Rotate_X will hold: the final value of P will be modified by rotation around the X axis.

package Geometry is type Point_3D is record X, Y, Z : Float; end record; procedure Rotate_X (P : in out Point_3D) with Post => P = P'Old'Update (Y => P.Z'Old, Z => -P.Y'Old); end Geometry;
package body Geometry is procedure Rotate_X (P : in out Point_3D) is Tmp : constant Float := P.Y; begin P.Y := P.Z; P.Z := -Tmp; end Rotate_X; end Geometry;

SPARK's analysis tool can mathematically prove that the postcondition is true.

Pointers Are Not Arrays

The greatest source of vulnerabilities regarding pointers is their use as substitutes for arrays. Although the C language has a syntax for declaring and accessing arrays, this is just a thin syntactic layer on top of pointers. Thus:

  • Array access is just pointer arithmetic;

  • If a function is to manipulate an array then the array's length must be separately passed as a parameter; and

  • The program is susceptible to the various vulnerabilities originating from the confusion of pointers and arrays, such as buffer overflow.

Consider a function that counts the number of times a value is present in an array. In C, this could be written:

#include <stdio.h> int count(int *p, int len, int v) { int count = 0; while (len--) { if (*p++ == v) { count++; } } return count; } int main() { int p[5] = {0, 3, 9, 3, 3}; int c = count(p, 5, 3); printf("value 3 is seen %d times in p\n", c); return 0; }

Function count has no control over the range of addresses accessed from pointer p. The critical property that the len parameter is a valid length for an array of integers pointed to by parameter p rests completely with the caller of count, and count has no way to check that this is true.

To mitigate the risks associated with pointers being used for arrays, MISRA C contains eight rules in a section on "Pointers and arrays". These rules forbid pointer arithmetic (Rule 18.4) or, if this Advisory rule is not followed, require pointer arithmetic to stay within bounds (Rule 18.1). But, even if we rewrite the loop in count to respect all decidable MISRA C rules, the program's correctness still depends on the caller of count passing a correct value of len:

#include <stdio.h> int count(int *p, int len, int v) { int count = 0; for (int i = 0; i < len; i++) { if (p[i] == v) { count++; } } return count; } int main() { int p[5] = {0, 3, 9, 3, 3}; int c = count(p, 5, 3); printf("value 3 is seen %d times in p\n", c); return 0; }

The resulting code is more readable, but still vulnerable to incorrect values of parameter len passed by the caller of count, which violates undecidable MISRA C Rules 18.1 (pointer arithmetic should stay within bounds) and 1.3 (no undefined behavior). Contrast this with the same function in SPARK (and Ada):

package Types is type Int_Array is array (Positive range <>) of Integer; end Types;
with Types; use Types; function Count (P : Int_Array; V : Integer) return Natural;
function Count (P : Int_Array; V : Integer) return Natural is Count : Natural := 0; begin for I in P'Range loop if P (I) = V then Count := Count + 1; end if; end loop; return Count; end Count;
with Ada.Text_IO; use Ada.Text_IO; with Types; use Types; with Count; procedure Test_Count is P : constant Int_Array := (0, 3, 9, 3, 3); C : constant Integer := Count (P, 3); begin Put_Line ("value 3 is seen" & C'Img & " times in p"); end Test_Count;

The array parameter P is not simply a homogeneous sequence of Integer values. The compiler must represent P so that its lower and upper bounds (P'First and P'Last) and thus also its length (P'Length) can be retrieved. Function Count can simply loop over the range of valid array indexes P'First .. P'Last (or P'Range for short). As a result, function Count can be verified in isolation to be free of vulnerabilities such as buffer overflow, as it does not depend on the values of parameters passed by its callers. In fact, we can go further in SPARK and show that the value returned by Count is no greater than the length of parameter P by stating this property in the postcondition of Count and asking the SPARK analysis tool to prove it:

package Types is type Int_Array is array (Positive range <>) of Integer; end Types;
with Types; use Types; function Count (P : Int_Array; V : Integer) return Natural with Post => Count'Result <= P'Length;
function Count (P : Int_Array; V : Integer) return Natural is Count : Natural := 0; begin for I in P'Range loop pragma Loop_Invariant (Count <= I - P'First); if P (I) = V then Count := Count + 1; end if; end loop; return Count; end Count;

The only help that SPARK analysis required from the programmer, in order to prove the postcondition, is a loop invariant (a special kind of assertion) that reflects the value of Count at each iteration.

Pointers Should Be Typed

The C language defines a special pointer type void* that corresponds to an untyped pointer. It is legal to convert any pointer type to and from void*, which makes it a convenient way to simulate C++ style templates. Consider the following code which indirectly applies assign_int to integer i and assign_float to floating-point f by calling assign on both:

#include <stdio.h> void assign_int (int *p) { *p = 42; } void assign_float (float *p) { *p = 42.0; } typedef void (*assign_fun)(void *p); void assign(assign_fun fun, void *p) { fun(p); } int main() { int i; float f; assign((assign_fun)&assign_int, &i); assign((assign_fun)&assign_float, &f); printf("i = %d; f = %f\n", i, f); }

The references to the variables i and f are implicitly converted to the void* type as a way to apply assign to any second parameter p whose type matches the argument type of its first argument fun. The use of an untyped argument means that the responsibility for the correct typing rests completely with the programmer. Swap i and f in the calls to assign and you still get a compilable program without warnings, that runs and produces completely bogus output:

i = 1109917696; f = 0.000000

instead of the expected:

i = 42; f = 42.000000

Generics in SPARK (and Ada) can implement the desired functionality in a fully typed way, with any errors caught at compile time, where procedure Assign applies its parameter procedure Initialize to its parameter V:

generic type T is private; with procedure Initialize (V : out T); procedure Assign (V : out T);
procedure Assign (V : out T) is begin Initialize (V); end Assign;
with Ada.Text_IO; use Ada.Text_IO; with Assign; procedure Apply_Assign is procedure Assign_Int (V : out Integer) is begin V := 42; end Assign_Int; procedure Assign_Float (V : out Float) is begin V := 42.0; end Assign_Float; procedure Assign_I is new Assign (Integer, Assign_Int); procedure Assign_F is new Assign (Float, Assign_Float); I : Integer; F : Float; begin Assign_I (I); Assign_F (F); Put_Line ("I =" & I'Img & "; F =" & F'Img); end Apply_Assign;

The generic procedure Assign must be instantiated with a specific type for T and a specific procedure (taking a single out parameter of this type) for Initialize. The procedure resulting from the instantiation applies to a variable of this type. So switching I and F above would result in an error detected by the compiler. Likewise, an instantiation such as the following would also be a compile-time error:

procedure Assign_I is new Assign (Integer, Assign_Float);

Enforcing Strong Typing for Scalars

In C, all scalar types can be converted both implicitly and explicitly to any other scalar type. The semantics is defined by rules of promotion and conversion, which can confuse even experts. One example was noted earlier, in the Preface. Another example appears in an article introducing a safe library for manipulating scalars by Microsoft expert David LeBlanc. In its conclusion, the author acknowledges the inherent difficulty in understanding scalar type conversions in C, by showing an early buggy version of the code to produce the minimum signed integer:

return (T)(1 << (BitCount()-1));

The issue here is that the literal 1 on the left-hand side of the shift is an int, so on a 64-bit machine with 32-bit int and 64-bit type T, the above is shifting 32-bit value 1 by 63 bits. This is a case of undefined behavior, producing an unexpected output with the Microsoft compiler. The correction is to convert the first literal 1 to T before the shift:

return (T)((T)1 << (BitCount()-1));

Although he'd asked some expert programmers to review the code, no one found this problem.

To avoid these issues as much as possible, MISRA C defines its own type system on top of C types, in the section on "The essential type model" (eight rules). These can be seen as additional typing rules, since all rules in this section are decidable, and can be enforced at the level of a single translation unit. These rules forbid in particular the confusing cases mentioned above. They can be divided into three sets of rules:

  • restricting operations on types

  • restricting explicit conversions

  • restricting implicit conversions

Restricting Operations on Types

Apart from the application of some operations to floating-point arguments (the bitwise, mod and array access operations) which are invalid and reported by the compiler, all operations apply to all scalar types in C. MISRA C Rule 10.1 constrains the types on which each operation is possible as follows.

Arithmetic Operations on Arithmetic Types

Adding two Boolean values, or an Apple and an Orange, might sound like a bad idea, but it is easily done in C:

#include <stdbool.h> #include <stdio.h> int main() { bool b1 = true; bool b2 = false; bool b3 = b1 + b2; typedef enum {Apple, Orange} fruit; fruit f1 = Apple; fruit f2 = Orange; fruit f3 = f1 + f2; printf("b3 = %d; f3 = %d\n", b3, f3); return 0; }

No error from the compiler here. In fact, there is no undefined behavior in the above code. Variables b3 and f3 both end up with value 1. Of course it makes no sense to add Boolean or enumerated values, and thus MISRA C Rule 18.1 forbids the use of all arithmetic operations on Boolean and enumerated values, while also forbidding most arithmetic operations on characters. That leaves the use of arithmetic operations for signed or unsigned integers as well as floating-point types and the use of modulo operation % for signed or unsigned integers.

Here's an attempt to simulate the above C code in SPARK (and Ada):

package Bad_Arith is B1 : constant Boolean := True; B2 : constant Boolean := False; B3 : constant Boolean := B1 + B2; type Fruit is (Apple, Orange); F1 : constant Fruit := Apple; F2 : constant Fruit := Orange; F3 : constant Fruit := F1 + F2; end Bad_Arith;

Here is the output from AdaCore's GNAT compiler:

 1.     package Bad_Arith is
 3.        B1 : constant Boolean := True;
 4.        B2 : constant Boolean := False;
 5.        B3 : constant Boolean := B1 + B2;
    >>> there is no applicable operator "+" for type "Standard.Boolean"

 7.        type Fruit is (Apple, Orange);
 8.        F1 : constant Fruit := Apple;
 9.        F2 : constant Fruit := Orange;
10.        F3 : constant Fruit := F1 + F2;
    >>> there is no applicable operator "+" for type "Fruit" defined at line 7

12.     end Bad_Arith;

It is possible, however, to get the predecessor of a Boolean or enumerated value with Value'Pred and its successor with Value'Succ, as well as to iterate over all values of the type:

with Ada.Text_IO; use Ada.Text_IO; procedure Ok_Arith is B1 : constant Boolean := False; B2 : constant Boolean := Boolean'Succ (B1); B3 : constant Boolean := Boolean'Pred (B2); type Fruit is (Apple, Orange); F1 : constant Fruit := Apple; F2 : constant Fruit := Fruit'Succ (F1); F3 : constant Fruit := Fruit'Pred (F2); begin pragma Assert (B1 = B3); pragma Assert (F1 = F3); for B in Boolean loop Put_Line (B'Img); end loop; for F in Fruit loop Put_Line (F'Img); end loop; end Ok_Arith;

Boolean Operations on Boolean

"Two bee or not two bee? Let's C":

#include <stdbool.h> #include <stdio.h> int main() { typedef enum {Ape, Bee, Cat} Animal; bool answer = (2 * Bee) || ! (2 * Bee); printf("two bee or not two bee? %d\n", answer); return 0; }

The answer to the question posed by Shakespeare's Hamlet is 1, since it reduces to A or not A and this is true in classical logic.

As previously noted, MISRA C forbids the use of the multiplication operator with an operand of an enumerated type. Rule 18.1 also forbids the use of Boolean operations "and", "or", and "not" (&&, ||, !, respectively, in C) on anything other than Boolean operands. It would thus prohibit the Shakespearian code above.

Below is an attempt to express the same code in SPARK (and Ada), where the Boolean operators are and, or, and not. The and and or operators evaluate both operands, and the language also supplies short-circuit forms that evaluate the left operand and only evaluate the right operand when its value may affect the result.

package Bad_Hamlet is type Animal is (Ape, Bee, Cat); Answer : Boolean := 2 * Bee or not 2 * Bee; -- Illegal end Bad_Hamlet;

As expected, the compiler rejects this code. There is no available * operation that works on an enumeration type, and likewise no available or or not operation.

Bitwise Operations on Unsigned Integers

Here's a genetic engineering example that combines a Bee with a Dog to produce a Cat, by manipulating the atomic structure (the bits in its representation):

#include <stdbool.h> #include <assert.h> int main() { typedef enum {Ape, Bee, Cat, Dog} Animal; Animal mutant = Bee ^ Dog; assert (mutant == Cat); return 0; }

This algorithm works by accessing the underlying bitwise representation of Bee and Dog (0x01 and 0x03, respectively) and, by applying the exclusive-or operator ^, transforming it into the underlying bitwise representation of a Cat (0x02). While powerful, manipulating the bits in the representation of values is best reserved for unsigned integers as illustrated in the book Hacker's Delight. MISRA C Rule 18.1 thus forbids the use of all bitwise operations on anything but unsigned integers.

Below is an attempt to do the same in SPARK (and Ada). The bitwise operators are and, or, xor, and not, and the related bitwise functions are Shift_Left, Shift_Right, Shift_Right_Arithmetic, Rotate_Left and Rotate_Right:

package Bad_Genetics is type Animal is (Ape, Bee, Cat, Dog); Mutant : Animal := Bee xor Dog; -- ERROR pragma Assert (Mutant = Cat); end Bad_Genetics;

The declaration of Mutant is illegal, since the xor operator is only available for Boolean and unsigned integer (modular) values; it is not available for Animal. The same restriction applies to the other bitwise operators listed above. If we really wanted to achieve the effect of the above code in legal SPARK (or Ada), then the following approach will work (the type Unsigned_8 is an 8-bit modular type declared in the predefined package Interfaces).

with Interfaces; use Interfaces; package Unethical_Genetics is type Animal is (Ape, Bee, Cat, Dog); A : constant array (Animal) of Unsigned_8 := (Animal'Pos (Ape), Animal'Pos (Bee), Animal'Pos (Cat), Animal'Pos (Dog)); Mutant : Animal := Animal'Val (A (Bee) xor A (Dog)); pragma Assert (Mutant = Cat); end Unethical_Genetics;

Note that and, or, not and xor are used both as logical operators and as bitwise operators, but there is no possible confusion between these two uses. Indeed the use of such operators on values from modular types is a natural generalization of their uses on Boolean, since values from modular types are often interpreted as arrays of Booleans.

Restricting Explicit Conversions

A simple way to bypass the restrictions of Rule 10.1 is to explicitly convert the arguments of an operation to a type that the rule allows. While it can often be useful to cast a value from one type to another, many casts that are allowed in C are either downright errors or poor replacements for clearer syntax.

One example is to cast from a scalar type to Boolean. A better way to express (bool)x is to compare x to the zero value of its type: x != 0 for integers, x != 0.0 for floats, x != '0' for characters, x != Enum where Enum is the first enumerated value of the type. Thus, MISRA C Rule 10.5 advises avoiding casting non-Boolean values to Boolean.

Rule 10.5 also advises avoiding other casts that are, at best, obscure:

  • from a Boolean to any other scalar type

  • from a floating-point value to an enumeration or a character

  • from any scalar type to an enumeration

The rules are not symmetric, so although a float should not be cast to an enum, casting an enum to a float is allowed. Similarly, although it is advised to not cast a character to an enum, casting an enum to a character is allowed.

The rules in SPARK are simpler. There are no conversions between numeric types (integers, fixed-point and floating-point) and non-numeric types (such as Boolean, Character, and other enumeration types). Conversions between different non-numeric types are limited to those that make semantic sense, for example between a derived type and its parent type. Any numeric type can be converted to any other numeric type, with precise rules for rounding/truncating values when needed and run-time checking that the converted value is in the range associated with the target type.

Restricting Implicit Conversions

Rules 10.1 and 10.5 restrict operations on types and explicit conversions. That's not enough to avoid problematic C programs; a program violating one of these rules can be expressed using only implicit type conversions. For example, the Shakespearian code in section Boolean Operations on Boolean can be reformulated to satisfy both Rules 10.1 and 10.5:

#include <stdbool.h> #include <stdio.h> int main() { typedef enum {Ape, Bee, Cat} Animal; int b = Bee; bool t = 2 * b; bool answer = t || ! t; printf("two bee or not two bee? %d\n", answer); return 0; }

Here, we're implicitly converting the enumerated value Bee to an int, and then implicitly converting the integer value 2 * b to a Boolean. This does not violate 10.1 or 10.5, but it is prohibited by MISRA C Rule 10.3: "The value of an expression shall not be assigned to an object with a narrower essential type or of a different essential type category".

Rule 10.1 also does not prevent arguments of an operation from being inconsistent, for example comparing a floating-point value and an enumerated value. But MISRA C Rule 10.4 handles this situation: "Both operands of an operator in which the usual arithmetic conversions are performed shall have the same essential type category".

In addition, three rules in the "Composite operators and expressions" section avoid common mistakes related to the combination of explicit/implicit conversions and operations.

The rules in SPARK (and Ada) are far simpler: there are no implicit conversions! This applies both between types of a different essential type category as MISRA C puts it, as well as between types that are structurally the same but declared as different types.

procedure Bad_Conversions is pragma Warnings (Off); F : Float := 0.0; I : Integer := 0; type Animal is (Ape, Bee, Cat); type My_Animal is new Animal; -- derived type A : Animal := Cat; M : My_Animal := Bee; B : Boolean := True; C : Character := 'a'; begin F := I; -- ERROR I := A; -- ERROR A := B; -- ERROR M := A; -- ERROR B := C; -- ERROR C := F; -- ERROR end Bad_Conversions;

The compiler reports a mismatch on every statement in the above procedure (the declarations are all legal).

Adding explicit conversions makes the assignments to F and M valid, since SPARK (and Ada) allow conversions between numeric types and between a derived type and its parent type, but all other conversions are illegal:

procedure Bad_Conversions is pragma Warnings (Off); F : Float := 0.0; I : Integer := 0; type Animal is (Ape, Bee, Cat); type My_Animal is new Animal; -- derived type A : Animal := Cat; M : My_Animal := Bee; B : Boolean := True; C : Character := 'a'; begin F := Float (I); -- OK I := Integer (A); -- ERROR A := Animal (B); -- ERROR M := My_Animal (A); -- OK B := Boolean (C); -- ERROR C := Character (F); -- ERROR end Bad_Conversions;

Although an enumeration value cannot be converted to an integer (or vice versa) either implicitly or explicitly, SPARK (and Ada) provide functions to obtain the effect of a type conversion. For any enumeration type T, the function T'Pos(e) takes an enumeration value from type T and returns its relative position as an integer, starting at 0. For example, Animal'Pos(Bee) is 1, and Boolean'Pos(False) is 0. In the other direction, T'Val(n), where n is an integer, returns the enumeration value in type T at relative position n. If n is negative or greater then T'Pos(T'Last) then a run-time exception is raised.

Hence, the following is valid SPARK (and Ada) code; Character is defined as an enumeration type:

procedure Ok_Conversions is pragma Warnings (Off); F : Float := 0.0; I : Integer := 0; type Animal is (Ape, Bee, Cat); type My_Animal is new Animal; A : Animal := Cat; M : My_Animal := Bee; B : Boolean := True; C : Character := 'a'; begin F := Float (I); I := Animal'Pos (A); I := My_Animal'Pos (M); I := Boolean'Pos (B); I := Character'Pos (C); I := Integer (F); A := Animal'Val (2); end Ok_Conversions;