Jump to 
content
HP.com Home Products and Services Support and Drivers Solutions How to Buy
»  Contact HP

 

HP C

HP C
Language Reference Manual


Previous Contents Index

1.1.1 Trigraph Sequences

To write C programs using character sets that do not contain all of C's punctuation characters, ANSI C allows the use of nine trigraph sequences in the source file. These three-character sequences are replaced by a single character in the first phase of compilation. (See Section 2.16 for an explanation of compilation phases.) Table 1-1 lists the valid trigraph sequences and their character equivalents.

Table 1-1 Trigraph Sequences
Trigraph Sequence Character Equivalent
??= #
??( [
??/ \
??) ]
??' ^
??< {
??! |
??> }
??- ~

No other trigraph sequences are recognized. A question mark (?) that does not begin a trigraph sequence remains unchanged during compilation. For example, consider the following source line:


printf ("Any questions???/n"); 

After the ??/ sequence is replaced, this line is translated as follows:


printf ("Any questions?\n"); 

1.1.2 Digraph Sequences

Digraph processing is supported when compiling in ISO C 94 mode (/STANDARD=ISOC94 on OpenVMS systems).

Digraphs are pairs of characters that translate into a single character, much like trigraphs, except that trigraphs get replaced inside string literals, but digraphs do not. Table 1-2 lists the valid digraph sequences and their character equivalents.

Table 1-2 Digraph Sequences
Digraph Sequence Character Represented
<: [
:> ]
<% {
%> }
%: #
%:%: ##

1.2 Identifiers

An identifier is a sequence of characters that represents a name for the following:

  • Variable
  • Function
  • Label
  • Type definition
  • Structure, enumeration, or union tag
  • Structure, enumeration, or union member
  • Enumeration constant
  • Macro
  • Macro parameter

The following rules apply to identifiers:

  • Identifiers consist of a sequence of one or more: uppercase or lowercase alphabetic characters, universal character names, the digits 0 to 9, the dollar sign ($), and the underscore character (_).
    Using the $ character provokes a warning from the compiler in strict ANSI mode.
  • Character case is significant in identifiers; for example, the identifier Test1 is different from the identifier test1.
  • Identifiers cannot begin with a digit.
  • Do not begin identifiers with an underscore; the ANSI C standard reserves such identifiers for internal names.
  • Each universal character name in an identifier must designate a character whose encoding in ISO/IEC 10646 falls into one of the ranges specified in Appendix F.
  • Keywords are not identifiers ( Section 1.5 lists the C keywords).
  • Using the names of library functions for identifiers is bad practice (Chapter 9 lists the C library function names). A function with the same name as a library function will supersede the library function. This may be the desired outcome, but program maintenance can be confusing.
  • In general, identifiers are separated by white space, punctuators, or operators. For example, the following code fragment has four identifiers:


    struct employee { int number; char sex; } emp; 
    

    The identifiers are: employee, number, sex, and emp. (struct, int, and char are keywords).

An identifier without external linkage has at most 32,767 significant characters. An identifier with external linkage has 1023 significant characters on Tru64 UNIX systems and 31 significant characters for OpenVMS platforms. ( Section 2.8 describes linkage in more detail.) Case is not significant in external identifiers on OpenVMS systems.

Identifiers that differ within their significant characters are different identifiers. If two or more identifiers differ in nonsignificant characters only, they are treated as the same identifier.

1.3 Universal Character Names

Universal character names provide a way to name other characters. They can be used in identifiers, character constants, and string literals to designate characters that are not in the basic character set.

A universal character name begins with a \u or \U and is followed by either four or eight hexadecimal digits.

The universal character name \Unnnnnnnn designates the character whose eight-digit short identifier (as specified by ISO/IEC 10646) is nnnnnnnn) Similarly, the universal character name \unnnn designates the character whose four-digit short identifier is nnnn (and whose eight-digit short identifier is 0000nnnn).

A universal character name cannot specify a character whose short identifier is less than 00A0, other than 0024 ($), 0040 (@), or 0060 (`), nor one in the range D800 through DFFF inclusive.)

See Appendix F for a list of valid universal character names.

1.4 Comments

Except within a character constant, string literal, or a comment, the /* character combination introduces a comment and the */ character combination ends a comment. The contents of such a comment are examined only to identify multibyte characters and to find the characters */ to terminate it.

Alternatively, the // character combination introduces a comment that includes all multibyte characters up to, but not including, the next new-line character. The contents of such a comment are examined only to identify multibyte characters and to find the terminating new-line character.

Comments cannot be nested; once a comment is started, the compiler treats the first occurrence of */ as the end of the comment.

To comment out sections of code, avoid using the /* and */ sequences. Using the /* and */ sequences works only for code sections containing no comments, because comments do not nest. A better method is to use the #if and #endif preprocessor directives, as in the following example:


#if 0 
/*  This code is excluded from execution because ...  */ 
code_to_be_excluded (); 
#endif 

See Chapter 8 for more information on the preprocessing directives #if and #endif.

Comments cannot span source files. Within a source file, comments can be of any length and are interpreted as white space by both the compiler and the preprocessor.

Examples:


"a//b"                   // four-character string literal 
#include "//e"           // undefined behavior 
// */                    // comment, not syntax error 
f = g/**//h;             // equivalent to f = g / h; 
//\
i();                     // part of a two-line comment 
/\
/ j();                   // part of a two-line comment 
#define glue(x,y) x##y 
glue(/,/) k();           // syntax error, not comment 
/*//*/ l();              // equivalent to l(); 
m = n//**/o 
+ p;                     // equivalent to m = n + p; 

1.5 Keywords

C defines several keywords, each with special meaning to the compiler. Keywords identify statement constructs and specify basic types and storage classes. Keywords cannot be used as identifiers and cannot be declared.

Table 1-3 lists the C keywords.

Table 1-3 Keywords
auto double int struct
break else long switch
case enum register typedef
char extern return union
const float short unsigned
continue for signed void
default goto sizeof volatile
do if static while
_Bool _Complex (ALPHA, I64) inline restrict
_Imaginary (ALPHA, I64)      

In addition to the keywords listed in Table 1-3, the compiler reserves all identifiers that begin with two underscores (__) or with an underscore followed by an uppercase letter. User variable names must never begin with one of these sequences.

Keywords are used as follows:

  • To assign a storage class to a variable or function (auto, extern, register, static)
  • To construct or qualify a data type (_Bool, char, _Complex (ALPHA, I64), const, double, enum, float, int, long, short, signed, struct, union, unsigned, void, volatile)
  • As part of a statement (break, case, continue, default, do, else, for, goto, if, return, switch, while)
  • To define a new named type (typedef)
  • To perform an operation (sizeof, __typeof__)

The following VAX C keywords are also sometimes 1 recognized by the compiler:


_align 
globaldef 
globalref 
globalvalue 
noshare 
readonly 
variant_struct 
variant_union 

The following C99 Standard keywords are also sometimes 2 recognized by the compiler:


inline 
restrict 

Use of a keyword as a superfluous macro name is not recommended, but is legal; for example, to change the default size of a basic data type:


#define int short 

Here, the keyword int has been redefined as short, which causes all data objects declared with the int data type to be stored as short objects.

Note

1 Recognized on OpenVMS systems when /STANDARD=RELAXED (the default), /STANDARD=VAXC or /ACCEPT=VAXC_KEYWORDS is specified on the compiler command line. Recognized on Tru64 UNIX systems when -vaxc or -accept vaxc_keywords is specified on the compiler command line.

2 Recognized on OpenVMS systems when /STANDARD=RELAXED (the default), /STANDARD=C99, or /ACCEPT=C99_KEYWORDS is specified on the compiler command line. Recognized on Tru64 UNIX systems when -std (the default), -c99, or -accept c99_keywords is specified on the compiler command line.

1.6 Operators

An operator is a token that specifies an operation on at least one operand, and yields some result (a value, designator, side effect, or some combination). Operands are expressions or constants (a form of expression). Operators with one operand are unary operators, and operators with two operands are binary operators. For example:


x = -b;          /*   Unary minus operator   */ 
y = a - c;       /*   Binary minus operator  */ 

Operators with three operands are called ternary operators.

All operators are ranked by precedence, a ranking system determining which operators are evaluated before others in a statement. See Chapter 6 for information on what each operator does and for the rules of operator precedence.

Some operators in C are composed of more than one character, while others are single characters. The single-character operators in C are:


!  %  ^  &  *  -  +  =  ~  |  .  <  >  /  ?  :  ,  [  ]  (  )  # 

The multiple-character operators in C are:


++    --    ->    <<     >>     <=    >=    ==    !=    *=    /= 
%=    +=    -=    <<=    >>=    &=    ^=    |=    ##    &&    || 

The # and ## operators can only be used in preprocessor macro definitions. See Chapter 8 for more information on predefined macros and preprocessor directives.

The sizeof operator determines the size of a data type. See Chapter 6 for more information on the sizeof operator.

The old form for compound assignment operators (=+, =-, =*, =/, =%, =<<, =>>, =&, =^, and =|) is not supported by the ANSI C standard. Use of these operators in a program is unsupported, and will produce unpredictable results. For example:


x =-3; 

This construction means x is assigned the value -3, not x is assigned the value x - 3.

The error-checking compiler option provides a warning message when the old form of compound assignment operators is encountered.

1.7 Punctuators

Some characters in C are used as punctuators, which have their own syntactic and semantic significance. Punctuators are not operators or identifiers. Table 1-4 lists the C punctuators.

Table 1-4 Punctuators
Punctuator Use Example
< > Header name <limits.h>
[ ] Array delimiter char a[7];
{ } Initializer list, function body, or compound statement delimiter char x[4] = {'H', 'i', '!', '\0' };
( ) Function parameter list delimiter; also used in expression grouping int f (x,y)
* Pointer declaration int *x;
, Argument list separator char x[4] = { 'H', 'i', '!', '\0'};
: Statement label labela: if (x :=,= 0) x += 1;
= Declaration initializer char x[4] = { "Hi!" };
; Statement end x += 1;
... Variable-length argument list int f ( int y, ...)
# Preprocessor directive #include <limits.h>
' ' Character constant char x = 'x';
" " String literal or header name char x[] = "Hi!";

The following punctuators must be used in pairs:

  • < >
  • [ ]
  • ( )
  • ' '
  • " "
  • { }

Some characters can be used either as a punctuator or as an operator, or as part of an operator. The context of the occurrence specifies the meaning. Punctuators usually delineate a specific type of C construct, as shown in Table 1-4.

1.8 String Literals

Strings are sequences of zero or more characters. A character string literal is a sequence of zero or more multibyte characters enclosed in double quotation marks, as in "xyz". String literals can include any valid character, including white-space characters and character escape sequences. A wide string literal is the same, except prefixed by the letter L. Once a string is stored as a string literal, modification of the string leads to undefined results.

In the following example, ABC is the string literal. It is assigned to a character array where each character in the string literal is stored as one array element. Storing a string literal in a character array lets you modify the characters of the array.


char x[] = "ABC"; 

String literals are typically stored as arrays of type char (or wchar_t if prefaced with an L), and have static storage duration.

The following declaration declares a character array to hold the string "Hello!":


char s[] = "Hello!"; 

The character array s is initialized with the characters specified in the double quotation marks, and terminated with a null character (\0). The null character marks the end of each string, and is automatically concatenated to the end of the string literal by the compiler. Adjacent string literals are automatically concatenated (with a single null character added at the end) to reduce the need for the line continuation character (the backslash at the end of a line).

Normal string literals and wide string literals can be concatenated, in which case the normal strings get promoted to wide strings, and a wide-string result is produced.

Following are some valid string literals:


""            /*  Here's a string with only the null character */ 
 
"You can have many characters in a string." 
 
"\"You can mix characters and escape sequences.\"\n" 
 
"Long lines of text can be continued on the next line \
by using the backslash character at the end of a line." 
 
"Or, long lines of text can be continued by using " 
"ANSI's concatenation of adjacent string literals." 
 
"\'\n"        /*  Only escape sequences are in this string    */ 

To determine the length of a given string literal (not including the null character), use the strlen function. See Chapter 9 for more information on other library routines available for string manipulation.

1.9 Constants

There are four categories of constants in C:

  • Integer constants (such as 63, 0, and 42L)
  • Floating-point constants (such as 1.2, 0.00, and 77E+2)
  • Hexadecimal floating-point constants (such as 0x1P-1 or 0x.1P3 to represent 1/2).
  • Character constants (such as 'A', '0', and L'\n')
  • Enumeration constants (such as enum boolean { NO, YES };), where NO and YES are the enumeration constants

The following sections describe these constants.

The value of any constant must be within the range of representable values for the specified type. Regardless of its type, a constant is a literal or symbolic value that does not change. A constant is also an rvalue, as defined in Section 2.14.


Previous Next Contents Index

Privacy statement Using this site means you accept its terms
© 2007 Hewlett-Packard Development Company, L.P.