MISRA Discussion Forums

This rule seems to be commonly deviated from by most people I have spoken to. I have given it thought, but I still cannot understand how it would lead to more secure programs.

The only argument provided by MISRA for this rule is that â€œthe type of an integer constant is a potential source of confusionâ€. Yes it is confusing, but only from a theoretical point-of-view, namely when you are determining what implicit type conversions that are done on the integer constant. That is, during a "scratch your head while learning C" session. During actual programming, you don't need to think about this.

I can't come up with one single scenario where leaving out the â€œUâ€ suffix will lead to bugs or unintended behavior. Can anyone make such an example? If one integer in the operation is signed and the other is unsigned, and they are both of the same size, the "usual arithmetic conversions" will make sure both are unsigned. Consider the following example:

result = -1 - 10;

In this example, the type of -1 is "signed int" and the type of 10 is "int". If "int" is treated as signed by the compiler, there will not be any problems as both operands are then signed.
If "int" is treated as unsigned, then the operand -1 will be converted to unsigned, that is, to the value 65535 (assuming 16-bit system). The result of the expression will then be 655535 - 10 = 65525, which may not be the result the programmer expected. However, this will not lead to a bug! Because we aren't done with that line.

If "result" is of signed type, then the result 65525 will be converted to signed during the assignment, making it -11, which is the expected result.
If "result" is of unsigned type, the code does not make sense. If you try to stuff signed constants into a unsigned variable, no U suffix will save you.

MISRA fails to name a case where the lack of a â€œUâ€ suffix would cause bugs. Integer constants are by their nature constant, and therefore deterministic. ISO C ensures that they are of the right size and signedness. It all boils down to what type the result is stored inside. If the type of the expression happens to be of different type than the variable, an implicit typecast will be done upon assignment to correct this.

Rule 10.6 also states that the implemented sizes of integer types is a reason why the U suffix should be used. I fail to see how this is relevant to the use of the U suffix. The example says that 40000 will be int or long depending on system... this has nothing to do with signedness at all. The example says that 0x8000 is possible signed in a 32-bit environment... so what? It is still a positive number in 32 bit. And you won't notice the signedness of a hex number until you represent it in a decimal format, and then it is up to you whether you want to display it as signed or unsigned.

Also, integer constants are never subject to the integer promotions, as they are always "int" or "long", so no need to worry about that either.

Further, writing â€œUâ€ all over the source code is not common practice among programmers. It causes much more confusion than the hidden signedness of integer constants and worst of all, it makes the code harder to read. In addition, the MISRA-C document in itself is filled with examples where MISRA themselves do not even follow this rule. I can easily come up with countless examples, just open a random page and you will likely find it. The rule is simply too rigid and not practical to use in reality.

Though perhaps there exists a case when the lack of this suffix will lead to bugs? If not, I don't see any reasons keeping this rule.

Lundin Wrote:result = -1 - 10;

First, the '-1' is not a constant, but a unary minus operator with a constant. Negative constant literals do not exist in C, AFAIK.

Lundin Wrote:Though perhaps there exists a case when the lack of this suffix will lead to bugs? If not, I don't see any reasons keeping this rule.

You may be interested to check my blog post here http://www.bezem.de/2009/03/why-32768-is...as-0x8000/.
The example might look a little constructed, but this issue actually occured in software I am responsible for, even though the given example was modified in some ways on extraction from the real code. The discrepancy is apparent from 32768 to 65535 inclusive, and also exists in C99 between long and long long constants.

In this context, I'm absolutely against removing rule 10.6.

BR,

Johan

The problem in that example is that the programmer is unaware of the signedness problems with the standard integer types, and that they are using high values close to the int16 maximum boundary without stopping to think twice. There is no MISRA rule solving this bug, afaik.

What will 10.6 solve is this case then? Nothing!

#define C_DECIMAL 32768U
#define C_HEXADECIMAL 0x8000U

~ is still performed on an uint16. Oops.

Now if you had written 0x8000L that would have solved the problem, but the L suffix is off-topic and has nothing to do with 10.6. My opinion remains the same, I cannot think of a scenario where rule 10.6 would acctually prevent bugs.

Rule 10.6 is not relevant to the example which you quote ("result = -1 - 10;").

It is unfortunate that Rule 10.6 is located in the section devoted to "Arithmetic type conversions". This is misleading because the rule has nothing to do with type conversions. It has to do with the requirement to apply a "U" suffix to a constant IF the constant is of unsigned type.

The type of an integer constant depends on several things:

a) The magnitude of the constant
b) The implemented size of the various integer types
c) The number base in which the constant is expressed
d) The presence or absence of suffixes

The type of a constant does NOT depend on the context in which the constant is used.

In the example quoted, there are 2 integer constants "1" and "10", both of which are of type "signed int". Note that "-1" is NOT a constant; it is a "constant expression" - i.e the constant "1" preceded by the unary minus operator.

See ISO:C90 6.1.3.2
"The type of an integer constant is the first of the corresponding list in which its value can be represented.

Unsuffixed decimal: int, long int, unsigned long int;
Unsuffixed octal or hexadecimal: int, unsigned int, long int, unsigned long int;
Suffixed by the letter u or U: unsigned int, unsigned long int;
Suffixed by the letter l or L: long int, unsigned long int;
Suffixed by both the letters u or U and l or L: unsigned long int."

This means that in an implementation with a 16 bit int and 32 bit long:

"0x8000" has type unsigned int
"2147483648" has type unsigned long

Rule 10.6 is only relevant to constants of large value and requires that a "U" suffix should be appended to both these constants - in order to make it obvious that they are unsigned constants.

The question of whether a signed or unsigned constant should be used in a particular context is addressed in Rule 10.1. Rule 10.1 demands (among other things) that a constant assigned to a signed object should be of signed type and a constant assigned to an unsigned object should be of unsigned type. It can be argued that the latter requirement is a little pedantic.

However, where the type of a constant is more significant is in expressions where "type balancing" occurs - what the ISO standard describes as " the usual arithmetic conversions". The intention of Rule 10.1 is to ensure that type balancing never occurs between two operands of different signedness. This is to ensure that the signedness of an expression is never ambiguous. Consider for example, the expression "u16a - 100", an operation involving an unsigned variable and a
constant of type signed int; if an int is implemented in 16 bits the result will be of type unsigned int. If an int is implemented in 32 bits the result will be of type "signed int" (and could be negative).

To reiterate: Rule 10.6 applies only to integer constants that, according to the C standard, are of unsigned type.

I beg to differ. If you use

Code:
#define C_DECIMAL     32768U

#define C_HEXADECIMAL 0x8000U

both constants will be considered to be of type 'unsigned int', the invert operation will invert 16 bit, the cast to 'long' will sign-extend the values to 0x00007FFFUL.
OK, it now depends on what you wanted in the first place, but at least both values are handled identically, and arguably "do what the programmer intended".
IMHO this is much better than without the suffixes.
It would be nice if a compiler could/would warn about a situation like the use of 0x00008000 being used as a constant. By supplying the leading zeroes the programmer at least suggests that she/he intends to provide 32 bits of value. However, the compiler will still create a 16-bit unsigned integer. Here's where tools like PC Lint fill in the gap, if you let them.

FWIW,

Johan

Fine, my example isn't relevant to the rule. Then please provide an example where failure to conform with rule 10.6 will lead to bugs, increase misunderstandings or confusion, or otherwise affect the safety of the system.

Quote:Consider for example, the expression "u16a - 100", an operation involving an unsigned variable and a constant of type signed int; if an int is implemented in 16 bits the result will be of type unsigned int. If an int is implemented in 32 bits the result will be of type "signed int" (and could be negative).

What does the U suffix solve then? If u16a doesn't underflow through the operation, the type of "100" doesn't matter. If u16a does underflow, how will you be aided by getting the result as an unsigned integer? Lets assume the programmer has written some suspicious code like this:

if((u16a - 100) > 32767)

Ok that rare scenario will cause problems (mainly because it is obfuscated, but let us ignore that for the discussion's sake). Now let us assume that 10.6 was not part of MISRA-C. How would we then make that code compliant?

First we have the advisory rule 12.11 "evaluation of constant unsigned integer expressions should not lead to wrap-around" explicitly banning the above code. But since it is only advisory, the user may ignore that rule. Then we have rule 10.1, forbidding any implicit casts on the complex expression u16a - 100. The underlying type of the expression is unt16_t, so in order to conform with rule 10.1 we must change the code to

if( (u16a - (uint16_t)100) > (uint16_t)32767 )

In this example, rule 10.6 is apparently superfluous. The example does however illustrate the importance of rule 12.11, perhaps that one should be made mandatory?

Quote:both constants will be considered to be of type 'unsigned int', the invert operation will invert 16 bit, the cast to 'long' will sign-extend the values to 0x00007FFFUL.
OK, it now depends on what you wanted in the first place, but at least both values are handled identically, and arguably "do what the programmer intended".
IMHO this is much better than without the suffixes.

First of all, I just realized your original code violates MISRA 12.7 "Bitwise operators shall not be applied to operands whose underlying type is signed". If you do unorthodox, non-MISRA compliant operations on large hexadecimal constants close to INT_MAX and then threat the result as signed decimal values, yes of course your program will run amok. Doing so is unwise in three different ways: because of signedness, because of implicit conversions, but also because ISO C allows other forms of signedness formatting than two's complement (ISO 9899:1999 6.2.6.2). Your code assumes two's complement, and thus is relies on implementation-defined behavior, which you must explicitly document according to MISRA.

To sum it up, people mixing signed integers and hexadecimal notation are asking for trouble whether they follow 10.6 or not. This is why we have rule 12.7 - it is a very good rule.

Quote:It would be nice if a compiler could/would warn about a situation like the use of 0x00008000 being used as a constant. By supplying the leading zeroes the programmer at least suggests that she/he intends to provide 32 bits of value. However, the compiler will still create a 16-bit unsigned integer. Here's where tools like PC Lint fill in the gap, if you let them

This isn't relevant to the U suffix. The solution is the L suffix.

The intention of my code sample never has been to show MISRA-compliant code, just to show one of the many bizarre properties of the C language standard. Using a 'U' suffix in my sample helps in that the behaviour of the code becomes identical in both cases. I just wanted to make a case in favour of the use of the 'U' suffix for unsigned constants, nothing more, but nothing less.
Failing to observe Rule 10.6 in my sample leads to bugs, increases misunderstandings and confusion, and affected the safety of my system. That my code is additionally violating a plethora of other MISRA rules is IMHO not the point.
If you don't like the use of suffixes on constants, fine with me. If casting every constant to the appropriate type is more to your liking, you'll even get better results, since the type information is more accurate. I take the freedom to consider that less readable than suffixes, but that's my personal opinion.

Quote:people mixing signed integers and hexadecimal notation are asking for trouble

The reason I wrote my blog entry was to alert people to the fact that a hexadecimal number and a decimal number of the same value may be treated differently depending on the actual value and the types applicable to those values. Many developers using constants cannot even tell you the actual type of a constant they use, and they should care, in order to avoid mixing signed numbers and hexadecimal notation inadvertently.

And yes, you are right, my last example has got nothing to do with 'U', but with 'L'.

I consider MISRA a tool to help developers, not a language specification for a subset of C. But again, that's just my opinion.

Regards,

Johan

Let me state a simpler example:

Code:
uint8_t Some_Var = 100;

uint16_t Int_Var = 5000;

W.r.t rule 10.6, why should there be a problem with the above definitions?

Both constants, 100 and 5000, have type int so Rule 10.6 does not apply to them.

However, both initialisations violate Rule 10.1 because the constant is being implicitly converted to an unsigned type.

Lundin

jbezem

Lundin

misra-c

jbezem

Lundin

jbezem

kalpak

misra-c