C Pointer and Array Summary
C Types and Operators Related to Pointers
PREREQUISITES — You should already…
- have some C programming experience.
- understand the concept and purpose of types in a statically typed language.
- understand expressions, operators and precedence.
Background
Variables of an array type behave differently from other variables. They degenerate into a pointer expression, i.e., an expression whose resulting value is a pointer type. This creates some confusion, since they result in pointer values, but are not pointer variables.
Addresses in General
In assembler, an address is just a number that represents the location of a byte in memory. Assembler programs must remember how many bytes they stored in memory, starting at that byte location, and choose the appropriate machine instruction to fetch the correct number of bytes.
In C, although we must be aware that all variables have an address, the type we choose for the variable will determine the machine instructions the compiler generates when we access the variable. This is much simpler — we only have to consider the types of our variables, and the compiler will determine the size and machine instructions.
Pointers
The term pointer type is a categorisation, or classification, of a potentially infinite list of possible types, which all share the same fundamental characteristic. It is an abstraction for the concept of “address of a particular type of value”. There is no type called “pointer”. Pointer types are derived from other, existing types.
Given that T
represents any type, built-in or user-defined; then T*
is a pointer type derived from T
, which we pronounce as “T pointer”. It means “address of a T
value”. So, int*
means the address of an int
value, and double*
means the address of a double
value, and so forth.
Obtaining Addresses
The address of any lvalue expression can be obtained with the address-of operator. An lvalue expression is an expression that represents (generally) modifiable memory. As variables are lvalues, we can take the addresses of variables with the address-of operator. The results of most operators are not lvalues, so you cannot take the address of, for example, the return value of the function call operator.
int I = 123; // `int` variable. It has an address.
int* P = &I; // address of `int` gives `int*`
int** Q = &P; // address of `int*` gives `int**`
printf("The address of I = %p\n", (void*) &I);
printf("The value of P = %p\n", (void*) P);
printf("The address of P = %p\n", (void*) &P);
printf("The value of Q = %p\n", (void*) Q);
printf("The address of Q = %p\n", (void*) &Q);
Arrays automatically result in “the address of the first element”, when they appear in an expression. Considering that A[0]
represents the first element of an array A
of T
values, then &A[0]
is a legal expression — but it is unnecessary, because A
by itself already results in that expression.
int A[3] = {11, 22, 33};
int* P = A; // store address of `11` in `P`.
int* Q = &A[0]; // store address of `11` in `Q` (long way).
The type of an expression which is the result of taking the address of an array variable, must mean “address of and array of N elements of type T”. So the type of the expression &A
for A
above, must mean “address of an array of 3
elements of type int
”, and the syntax for such a special type, is: int(*)[3]
, which is unfortunate (strange, obscure and limited). We pronounce it “pointer to array of 3
elements of int
”.
Another instance where pointers are automatically generated, is in the case of literal strings. A literal string ultimately results in a pointer to the first character. For simple literal strings, e.g. "ABC"
, the result will be a char*
, whereas for L"ABC"
(wide character literal string), the result will be wchar_t*
.
Indirection
Pointers would not be very useful if we did not have any operators that could work with them. There are not many of these operators, but they are crucial. The most important is the indirection operator. The indirection operator can represent memory, with exactly the same consequences and abilities as a variable (which is the traditional way to represent memory).
Definition — Indirection
Given:
E
–t→T*
(any expressionE
, of typeT*
);
Then:*E
is read as indirect E, and
Means: represent theT
value at addressE
, and so, has
Type:T
.
Remember that the value of E
is an address.
putchar(*"ABC"); // represent the first `char` at address of `"ABC"`.
int I = 123;
int* P = &I;
printf("I = %d\n", I); // output: 123
printf("&I = %p\n", (void*) &I); // output: ADDR_I
printf("P = %p\n", (void*) P); // output: ADDR_I
printf("*P = %d\n", *P); // output: 123
printf("&*P = %p\n", (void*)&*P); // output: ADDR_I
*P = 456;
printf("I = %d\n", I); // output: 456
printf("*P = %d\n", *P); // output: 456
In the example above, *P
represents the same memory as I
. But only because it currently contains the address of I
. If we later put the address of some variable J
into P
, the *P
will represent the same memory as J
.
Indirection expressions are expressions where the last operator evaluated is the indirection operator. Indirection expressions produce lvalues (like non-const
variables). Hence the indirection operator is one of the few operators that produces a result, which can be assigned to (*P = 456;
above). This also means that you can take the address of the result of an indirection operator, like &*P
(the precedence rules will evaluate the indirection first). This is of course pointless, since the result is exactly the same as just P
by itself.
Pointer values are most useful when the name of a variable is not available (not in scope). This is why they are most commonly used as the type for some parameters: if we want a function that can modify a variable, even if it is not in the scope of the function, we can simply pass the address of the variable.
The following function will swap the values of any two int
variables. We pass it the addresses of the two relevant variables:
Pointer Arithmetic
A surprisingly useful feature in C is that of pointer arithmetic, where 1+1
does not necessarily equal 2
. The rule states that we can add or subtract integer type values to or from pointer type values. The result is calculated using pointer arithmetic, without changing the type of the pointer. The result is determined as follows:
DEFINITION — Pointer Arithmetic
Given:
E
–t→T*
(any expressionE
, of typeT*
);
Then:E +/- I
is legal, and so isI +/- E
(commutativity of addition) and the result
Calculated as:E +/- I * sizeof(T)
, with
Type:T*
The arithmetic is commutative, so I +/- E
will produce the same result as E +/- I
. Pointer arithmetic also applies to the increment and decrement operators.
NOTE — Void Expressions
Apart from assignment (including argument passing and function returns) and casting to another pointer type, no operators work with void*
expressions. This also applies to pointer arithmetic.
Subscript Subterfuge
The subscript operator is not a real operator. It just represents a particular (easy-to-read) pattern, which is rewritten upon compilation to a more fundamental expression, and only then compiled.
Given the pattern A[I]
, it is translated to *(A+I)
, which is a combination of pointer arithmetic and indirection, in that order. The pattern I[A]
is, by the same rule, translated to *(I+A)
. Because of the commutativity rule, all four expressions produce the exact same result (22
in the examples below). However, this is not understood by many programmers, so one should rather use the A[I]
form, which is better understood by most, albeit superficially.
int A[3] = { 11, 22, 33 };
int* P = A;
printf(" A[1] = %d\n", A[1] ); // preferable.
printf(" 1[A] = %d\n", 1[A] );
printf(" *(A+1) = %d\n", *(A+1) );
printf("*(1+A) = %d\n", *(1+A) );
printf(" P[1] = %d\n", P[1] ); // preferable.
printf(" 1[P] = %d\n", 1[P] );
printf(" *(P+1) = %d\n", *(P+1) );
printf("*(1+P) = %d\n", *(1+P) );
As the example illustrates, the subscript operator is simply disguised pointer arithmetic and indirection. These operators do not care where the operands originate from — whether from an array expression, a pointer variable, a function return, or whatever.
Indirect Member Selection
The indirect member selection operator is another non-operator. Given a pointer to a structured type, e.g. P
of type S*
, a member M
of the structure can be selected with:
Since that seems a bit verbose, the following operator can be used instead, but it will be translated to the expression above:
Because the indirect member selection operator is more concise, it is recommended that you use it instead, even if it is not a real operator1.
Concluding Remarks
All rules that apply to indirection expressions, also apply to the array subscript operator, and to the indirect member selection operator.
Pointer arithmetic and indirection are at the heart of all C code — not much can be accomplished without them. This is exactly the same in assembler, except that we do not have the convenience of types, and automatic pointer arithmetic based on types.
These types and operators are fundamentally simple, but not often clearly explained. Considering that they are the cause of many bugs, C programmers should strive to completely understand these rules and operators. The result will be more than worth the effort.
In C++, the indirect member selection operator can be overloaded, in which case it becomes very real, and will not be translated.↩
2018-05-24: Added note about the pointer-to-array type. [brx]
2017-11-18: Update to new admonitions. [brx]
2017-09-23: Edited. [jjc]