Enforcing Basic Syntactic Guarantees

C's syntax is concise but also very permissive, which makes it easy to write programs whose effect is not what was intended. MISRA C contains guidelines to:

  • clearly distinguish code from comments
  • specially handle function parameters and result
  • ensure that control structures are not abused

Distinguishing Code and Comments

The problem arises from block comments in C, starting with /* and ending with */. These comments do not nest with other block comments or with line comments. For example, consider a block comment surrounding three lines that each increase variable a by one:

/*
++a;
++a;
++a; */

Now consider what happens if the first line is commented out using a block comment and the third line is commented out using a line comment (also known as a C++ style comment, allowed in C since C99):

/*
/* ++a; */
++a;
// ++a; */

The result of commenting out code that was already commented out is that the second line of code becomes live! Of course, the above example is simplified, but similar situations do arise in practice, which is the reason for MISRA C Directive 4.1 "Sections of code should not be 'commented out'". This is reinforced with Rules 3.1 and 3.2 from the section on "Comments" that forbid in particular the use of /* inside a comment like we did above.

These situations cannot arise in SPARK (or in Ada), as only line comments are permitted, using --:

--  A := A + 1;
--  A := A + 1;
--  A := A + 1;

So commenting again the first and third lines does not change the effect:

--  --  A := A + 1;
--  A := A + 1;
--  --  A := A + 1;

Specially Handling Function Parameters and Result

Handling the Result of Function Calls

It is possible in C to ignore the result of a function call, either implicitly or else explicitly by converting the result to void:

f();
(void)f();

This is particularly dangerous when the function returns an error status, as the caller is then ignoring the possibility of errors in the callee. Thus the MISRA C Directive 4.7: "If a function returns error information, then that error information shall be tested". In the general case of a function returning a result which is not an error status, MISRA C Rule 17.7 states that "The value returned by a function having non-void return type shall be used", where an explicit conversion to void counts as a use.

In SPARK, as in Ada, the result of a function call must be used, for example by assigning it to a variable or by passing it as a parameter, in contrast with procedures (which are equivalent to void-returning functions in C). SPARK analysis also checks that the result of the function is actually used to influence an output of the calling subprogram. For example, the first two calls to F in the following are detected as unused, even though the result of the function call is assigned to a variable, which is itself used in the second case:

package Fun is function F return Integer is (1); end Fun;
with Fun; use Fun; procedure Use_F (Z : out Integer) is X, Y : Integer; begin X := F; Y := F; X := Y; Z := F; end Use_F;

Only the result of the third call is used to influence the value of an output of Use_F, here the output parameter Z of the procedure.

Handling Function Parameters

In C, function parameters are treated as local variables of the function. They can be modified, but these modifications won't be visible outside the function. This is an opportunity for mistakes. For example, the following code, which appears to swap the values of its parameters, has in reality no effect:

void swap (int x, int y) {
   int tmp = x;
   x = y;
   y = tmp;
}

MISRA C Rule 17.8 prevents such mistakes by stating that "A function parameter should not be modified".

No such rule is needed in SPARK, since function parameters are only inputs so cannot be modified, and procedure parameters have a mode defining whether they can be modified or not. Only parameters of mode out or in out can be modified -- and these are prohibited from functions in SPARK -- and their modification is visible at the calling site. For example, assigning to a parameter of mode in (the default parameter mode if omitted) results in compilation errors:

procedure Swap (X, Y : Integer) is Tmp : Integer := X; begin X := Y; -- ERROR Y := Tmp; -- ERROR end Swap;

Here is the output of AdaCore's GNAT compiler:

1.     procedure Swap (X, Y : Integer) is
2.        Tmp : Integer := X;
3.     begin
4.        X := Y;  --  ERROR
          |
   >>> assignment to "in" mode parameter not allowed

5.        Y := Tmp;  --  ERROR
          |
   >>> assignment to "in" mode parameter not allowed

6.     end Swap;

The correct version of Swap in SPARK takes parameters of mode in out:

procedure Swap (X, Y : in out Integer) is Tmp : constant Integer := X; begin X := Y; Y := Tmp; end Swap;

Ensuring Control Structures Are Not Abused

The previous issue (ignoring the result of a function call) is an example of a control structure being abused, due to the permissive syntax of C. There are many such examples, and MISRA C contains a number of guidelines to prevent such abuse.

Preventing the Semicolon Mistake

Because a semicolon can act as a statement, and because an if-statement and a loop accept a simple statement (possibly only a semicolon) as body, inserting a single semicolon can completely change the behavior of the code:

int func() {
   if (0)
      return 1;
   while (1)
      return 0;
   return 0;
}

As written, the code above returns with status 0. If a semicolon is added after the first line (if (0);), then the code returns with status 1. If a semicolon is added instead after the third line (while (1);), then the code does not return. To prevent such surprises, MISRA C Rule 15.6 states that "The body of an iteration-statement or a selection-statement shall be a compound statement" so that the code above must be written:

int func() {
   if (0) {
      return 1;
   }
   while (1) {
      return 0;
   }
   return 0;
}

Note that adding a semicolon after the test of the if or while statement has the same effect as before! But doing so would violate MISRA C Rule 15.6.

In SPARK, the semicolon is not a statement by itself, but rather a marker that terminates a statement. The null statement is an explicit null;, and all blocks of statements have explicit begin and end markers, which prevents mistakes that are possible in C. The SPARK (also Ada) version of the above C code is as follows:

function Func return Integer is begin if False then return 1; end if; while True loop return 0; end loop; return 0; end Func;

Avoiding Complex Switch Statements

Switch statements are well-known for being easily misused. Control can jump to any case section in the body of the switch, which in C can be before any statement contained in the body of the switch. At the end of the sequence of statements associated with a case, execution continues with the code that follows unless a break is encountered. This is a recipe for mistakes, and MISRA C enforces a simpler well-formed syntax for switch statements defined in Rule 16.1: "All switch statements shall be well-formed".

The other rules in the section on "Switch statements" go on detailing individual consequences of Rule 16.1. For example Rule 16.3 forbids the fall-through from one case to the next: "An unconditional break statement shall terminate every switch-clause". As another example, Rule 16.4 mandates the presence of a default case to handle cases not taken into account explicitly: "Every switch statement shall have a default label".

The analog of the C switch statements in SPARK (and in Ada) is the case statement. This statement has a simpler and more robust structure than the C switch, with control automatically exiting after one of the case alternatives is executed, and the compiler checking that the alternatives are disjoint (like in C) and complete (unlike in C). So the following code is rejected by the compiler:

package Sign_Domain is type Sign is (Negative, Zero, Positive); function Opposite (A : Sign) return Sign is (case A is -- ERROR when Negative => Positive, when Positive => Negative); function Multiply (A, B : Sign) return Sign is (case A is when Negative => Opposite (B), when Zero | Positive => Zero, when Positive => B); -- ERROR procedure Get_Sign (X : Integer; S : out Sign); end Sign_Domain;
package body Sign_Domain is procedure Get_Sign (X : Integer; S : out Sign) is begin case X is when 0 => S := Zero; when others => S := Negative; -- ERROR when 1 .. Integer'Last => S := Positive; end case; end Get_Sign; end Sign_Domain;

The error in function Opposite is that the when choices do not cover all values of the target expression. Here, A is of the enumeration type Sign, so all three values of the enumeration must be covered.

The error in function Multiply is that Positive is covered twice, in the second and the third alternatives. This is not allowed.

The error in procedure Get_Sign is that the others choice (the equivalent of C default case) must come last. Note that an others choice would be useless in Opposite and Multiply, as all Sign values are covered.

Here is a correct version of the same code:

package Sign_Domain is type Sign is (Negative, Zero, Positive); function Opposite (A : Sign) return Sign is (case A is when Negative => Positive, when Zero => Zero, when Positive => Negative); function Multiply (A, B : Sign) return Sign is (case A is when Negative => Opposite (B), when Zero => Zero, when Positive => B); procedure Get_Sign (X : Integer; S : out Sign); end Sign_Domain;
package body Sign_Domain is procedure Get_Sign (X : Integer; S : out Sign) is begin case X is when 0 => S := Zero; when 1 .. Integer'Last => S := Positive; when others => S := Negative; end case; end Get_Sign; end Sign_Domain;

Avoiding Complex Loops

Similarly to C switches, for-loops in C can become unreadable. MISRA C thus enforces a simpler well-formed syntax for for-loops, defined in Rule 14.2: "A for loop shall be well-formed". The main effect of this simplification is that for-loops in C look like for-loops in SPARK (and in Ada), with a loop counter that is incremented or decremented at each iteration. Section 8.14 defines precisely what a loop counter is:

  1. It has a scalar type;
  2. Its value varies monotonically on each loop iteration; and
  3. It is used in a decision to exit the loop.

In particular, Rule 14.2 forbids any modification of the loop counter inside the loop body. Here's the example used in MISRA C:2012 to illustrate this rule:

bool_t flag = false;

for ( int16_t i = 0; ( i < 5 ) && !flag; i++ )
{
  if ( C )
  {
    flag = true; /* Compliant - allows early termination of loop */
  }

  i = i + 3;     /* Non-compliant - altering the loop counter */
}

The equivalent SPARK (and Ada) code does not compile, because of the attempt to modify the value of the loop counter:

procedure Well_Formed_Loop (C : Boolean) is Flag : Boolean := False; begin for I in 0 .. 4 loop exit when not Flag; if C then Flag := True; end if; I := I + 3; -- ERROR end loop; end Well_Formed_Loop;

Removing the problematic line leads to a valid program. Note that the additional condition being tested in the C for-loop has been moved to a separate exit statement at the start of the loop body.

SPARK (and Ada) loops can increase (or, with explicit syntax, decrease) the loop counter by 1 at each iteration.

for I in reverse 0 .. 4 loop
   ... -- Successive values of I are 4, 3, 2, 1, 0
end loop;

SPARK loops can iterate over any discrete type; i.e., integers as above or enumerations:

type Sign is (Negative, Zero, Positive);

for S in Sign loop
  ...
end loop;

Avoiding the Dangling Else Issue

C does not provide a closing symbol for an if-statement. This makes it possible to write the following code, which appears to try to return the absolute value of its argument, while it actually does the opposite:

#include <stdio.h> int absval (int x) { int result = x; if (x >= 0) if (x == 0) result = 0; else result = -x; return result; } int main() { printf("absval(5) = %d\n", absval(5)); printf("absval(0) = %d\n", absval(0)); printf("absval(-10) = %d\n", absval(-10)); }

The warning issued by GCC or LLVM with option -Wdangling-else (implied by -Wall) gives a clue about the problem: although the else branch is written as though it completes the outer if-statement, in fact it completes the inner if-statement.

MISRA C Rule 15.6 avoids the problem: "The body of an iteration-statement or a selection-statement shall be a compound statement". That's the same rule as the one shown earlier for Preventing the Semicolon Mistake. So the code for absval must be written:

#include <stdio.h> int absval (int x) { int result = x; if (x >= 0) { if (x == 0) { result = 0; } } else { result = -x; } return result; } int main() { printf("absval(5) = %d\n", absval(5)); printf("absval(0) = %d\n", absval(0)); printf("absval(-10) = %d\n", absval(-10)); }

which has the expected behavior.

In SPARK (as in Ada), each if-statement has a matching end marker end if; so the dangling-else problem cannot arise. The above C code is written as follows:

function Absval (X : Integer) return Integer is Result : Integer := X; begin if X >= 0 then if X = 0 then Result := 0; end if; else Result := -X; end if; return Result; end Absval;

Interestingly, SPARK analysis detects here that the negation operation on line 9 might overflow. That's an example of runtime error detection which will be covered in the chapter on Detecting Undefined Behavior.