Addressing Pointers

Concepts of, and Operations on, Pointers

C's pointer types and operations on them, are vital. Mis­un­der­stand­ing them often causes pro­gram fail­ures. Master­ing these aspects is in­valuable for C/C++ pro­gram­mers. This dis­cus­sion covers all C99 rules for point­ers, which also applies to C++, but is not ne­ces­sa­ri­ly a be­gin­ner's guide. C++ ref­e­ren­ces are not covered.

PREREQUISITES — Readers should already…
  • have some programming experience, preferably in C.
  • understand the concept and purpose of types in a statically typed language.
  • understand expressions, operators and precedence.

Background

The model of memory, as represented to programs in most CPU architectures, is linear. To pro­grams, memory is a sequential arrangement of bytes. Each of these bytes can be ac­ces­sed by ad­dress, which is another way to say: ‘every byte has an address’.

NOTEArchitecture and Environment

Examples and values here are for a typical 32-bit, little endian Intel x86 archi­tec­ture, with 4-byte integers: sizeof(int). Addresses are random, and for reference only. These choices do not alter the discussed concepts and rules.

Typographical Conventions

Only a small number of unusual typographical conventions are employed here, in particular:

The other conventions are self-explanatory, like employing a monospaced font for code.

Memory Layout

The following assembler code extract (Intel® x86 syntax) allocates 4 variables (C, S, I and L), each having different sizes:

x86Allocating variables in assembly
   C     DB  052h               ; BYTE  (82 in decimal)
   S     DW  -1                 ; SHORT (FFFF in hex, 2's complement)
   I     DD  123456             ; DWORD (0001E240 in hex)
   L     DQ  0102030405060708h  ; QWORD

An example memory layout model is represented in the following diagram. Note that the ad­dress of each variable, is the address of the first byte of the sequence that comprises its val­ue. This is the byte with the lowest address, regardless of the endian­ness of the ar­chi­tec­ture.

figure: Linear Memory Layout

The equivalent example in C is shown below. Unlike assembler, C affords little control over the order in which variables are allocated, or where they are allocated. The memory layout may not exactly match the model above.

For performance reasons, variables may be aligned on 2-byte, 4-byte or 8-byte boundaries, so gaps may exist between the variables (padding). This is possible in assembler as well, but must be di­rect­ly con­trolled by the programmer. C compilers generally default to per­form­ance align­ment, but can often be con­trolled with a compiler switch / option.

cExample variable definitions & initialisation
// variable definitions --------------------- typical size in bytes
   char  C     = 'R';                      //← 1 (BYTE)
   short S     = -1;                       //← 2 (WORD)
   int   I     = 123456;                   //← 4 (DWORD)
   long long L = 0x0102030405060708LL;     //← 8 (QWORD/OWORD)

// output statements ------------------------ outputs (values only)
   printf("&C = %p, C = %c\n",   &C, C);   //← A
   printf("&S = %p, S = %d\n",   &S, S);   //← -1
   printf("&I = %p, I = %d\n",   &I, I);   //← 123456
   printf("&L = %p, L = %lld\n", &L, L);   //← 72623859790382856

For brevity, the addresses passed to printf() were not cast to void*, which is what "…%p…" ex­pects. This is seldom an issue, apart from a potential compiler warning. The ad­dress-of op­e­ra­tor (&) used in the example, is discussed later.

Memory Divisions

The space for the machine code, stack, static memory, and dynamic memory, are generally in se­pa­rate sections or segments. That does not necessarily prohibit a program from accessing any part of mem­o­ry; it is simply how memory is organised. Some architectures or operating systems, for se­cu­ri­ty rea­sons, may prevent programs from accessing the machine code division or code segment, as data (read or write).

This does not affect operations on pointers, only what is allowed to be accessed, and it fa­cil­i­tates a better understanding of the C memory environment. The stack and dynamic memory of­ten grow to­wards each other. Static memory is fixed at compile time, and ini­tia­lis­ed before accessed. The program mem­o­ry is also fix­ed, but is of little con­cern, since it is not directly ma­nip­u­la­ted in pro­grams.

Addresses

Machine instructions often take arguments as addresses. Some in­struc­tions use these ad­dres­ses to fetch bytes from mem­o­ry, start­ing at the first byte, and store it in a re­gis­ter, po­ten­tial­ly swap­ping the bytes based on CPU endian­ness. Endian­ness doesn't in­flu­ence how we handle ad­dres­ses or under­stand C pointers.

Definition 1: Address
An address is a number that represents the location of a single byte in memory. It pro­vides no se­man­tics regarding the number of bytes that comprise the value stored there.

In assembler, an address is just a number. Programmers decide when a number should be treat­ed as an ad­dress. Other times, a number might represent the count of items in a col­lec­tion, or the age of a per­son, or the code of a character. Without context, a number is just a number, es­pe­cial­ly as far as ma­chine code is concerned. How the number is applied, or which machine code instructions are util­ised, determines its meaning from a prob­lem-solv­ing per­spec­tive.

Any non-trivial machine code program, regardless of how it was written, will be littered with ad­dress values, used immediately, or indirectly, to access other values in memory. It is there­fore fun­da­men­tal to the operation of programs in a computer.

TERMINOLOGYImmediate Values

In assembler, values that form part of the machine instructions (op-codes), are called im­me­di­ate values. This means they exist in the code segment, as opposed to values (vari­ables, even read-only vari­ables), that reside in one of the data segments.

Macro assemblers allow programmers to name memory locations, but even so, the names simp­ly result in addresses. This does not absolve the programmer from using these named ad­dres­ses cor­rect­ly. The number of bytes comprising the values fetched (or stored), de­pends on the in­struc­tions chosen.

Indirection

Some machine instructions can be supplied with a register or memory location, and use the val­ue stored there as an address, which in turn indicates the location where a BYTE (1-byte), WORD (2-byte), DWORD (4-byte), QWORD (8-byte), or longer value, can be fetched or stored. This pro­cess is called de­ref­er­enc­ing, or in­di­rec­tion.

Definition 2: Indirection
When a value in memory is accessed by first obtaining its address from another value, the pro­cess is called indirection, and will require prior know­ledge re­gar­ding the number of bytes that com­prise the value.

Simplistically, the following assembler extract shows one variation on an in­di­rect mem­o­ry fetching instruction. It loads the value of V into a CPU register called EAX, by getting the ad­dress of V from another register, called EBX. Obviously, the correct address of V must be in EBX. The same in­di­rec­tion in­struc­tion can therefore fetch different variables, by simply changing the ad­dress value in EBX.

x86Indirection in x86 assembly
   V        DD 123456       ; DWORD variable (0001E240h).
···
   lea EBX, V               ; load address of V into EBX.
   mov EAX, dword ptr [EBX] ; indirectly, via EBX, load the
                            ; value of V into EAX.

The diagram below attempts to illustrate the operation. In the PDF versions of this material, it may be on the next page.

figure: Indirection via Register EBX

The address of V could have been stored in another memory location, for example named P, and an­other instruction to fetch the value of V, indirectly through P, could have been used. This has no ef­fect on the principle outlined here.

Memory Address Zero

Especially in architectures with operating systems, a user program is not loaded at address 0 (the be­gin­ning of physical memory). Practically, this means that, for all intents and purposes, address 0 is an illegal address. So much so, that it is often represented with a keyword, or spec­ial val­ue in pro­gram­ming lang­uages, like nil, null, nullptr, or NULL. In C and C++, a null pointer does not necessarily have the value 0 — it's an abstraction.

More sophisticated ar­chi­tec­tures will allow an operating system to let a pro­gram thinks it has access to all of mem­o­ry, starting from address 0, but meanwhile it may be mapped to another location in physical memory via hardware. Furthermore, a program may be loaded at random memory locations on every run, for security reasons.

Data Movement

In computer terminology, moving a value in memory, means making a copy. In assembler, programmers explicitly choose whether to move data to the memory location of a vari­able, or whether the address of the variable should be loaded. In a pro­gram­ming lang­uage like C, where variables are generally represented by a variable identifier, the com­pi­ler will emit one of two kinds of in­struc­tions, depending on context.

In either case, the name of the variable itself, does not automatically cause it to be fetched. It de­pends entirely on the operators in the same expression. If an operator with a side-ef­fect (one that mo­di­fies memory) is used on the variable, that instruction is performed. On the other hand, if the vari­able is used in conjunction with other operators, the val­ue is fetched and placed in the ex­pres­sion as a temporary.

Lvalues

C uses the term lvalue to refer to expressions, like variables, that represent memory. Several op­e­ra­tors will report that an lvalue is required, when an incorrect operand is supplied. Not all vari­ables are lvalues. Array variables, for example, are not lvalues, nor are the result of most operators. Variables with a const storage class, are lvalues, but are read-only. Lvalues will always have an address. Addresses of temporary variables cannot be taken.

Some operators can also represent memory, and their results qualify as lvalues. These are the in­di­rec­tion, subscript, and indirect member selection operators. As we shall see later, they all re­sult in in­di­rec­tion expressions. If the indirection expression represents an array, or const stor­age class mem­o­ry, the result is not an lvalue.

Rvalues

The term rvalue refers to expressions that are not necessarily required to represent memory. It means the operators applied to it only require a value, and need not modify memory. Vari­ables can be rvalues, but so can literals, and the results of expressions.

Rvalue expressions need not have addresses. For example, literals and enumerated values of­ten re­sult in immediate values in the machine code, and their addresses cannot be ob­tain­ed. Any temporary results of expressions, like return values of functions, are rvalues, often called temporaries, and may even be stored in CPU registers.

Assignment

The most obvious data movement is performed with the assignment operator. This includes all the com­pound assignment operators. The assignment operator is not to be confused with ini­tial­i­sa­tion syn­tax, which also uses the equals sign. The same implicit type conversion rules apply how­ever.

Assignment operators, apart from the results they place in expressions, as is the case with all operators, also modify memory; meaning the assignment operators are part of the set of side-effect operators mentioned above.

Argument Passing

When an argument is passed to a function, it serves as initialisation of a parameter, which is a special kind of local variable in the function being called (the callee). Parameters have the same scope and lifetime as local variables — for all practical purposes, they are local variables, except that the caller can ini­tial­ise them with arguments.

Although we use phrases like ‘pass an argument’, that is really an abstraction for ‘initialise a special parameter variable’, so that given a function F, returning void with one parameter P, of type int, i.e.:

void F (int param)

then calling the function as: F(123); is equivalent to this C pseudocode:

F(int param=123)

Syntax resembling this can be seen in languages that allow the arguments to be named when calling a function. In C#, for example, this would be a legal call: F(param=123).

Function Returns

The return statement conceptually assigns the result of an expression to a temporary anon­y­mous ‘function result’ variable. Abstraction: given for example, a function that returns a type T, then the statement:

returnexpr;

effectively results in this pseudocode:

T retvar =expr;
assembler: reclaim local variable memory (stack)
assembler: restore saved registers
assembler: jump to address after call.

The result of retvar will be available in the calling expression as the result of the function call operator, as a temporary variable. Often retvar may be a register for the sake of efficiency, but the abstraction remains valid.

IMPORTANTAbout Data Movement

The point about data movement is that, regardless of which kind of move­ment is used, a source type and a destination type are involved. In all these cases, if the source and des­ti­na­tion types do not match, the compiler will, if possible, provide implicit type con­ver­sion on the source type to match the destination type.

Addresses in C/C++

The C/C++ languages, like many others, abstract machine code operations, by allowing the pro­gram­mer to choose types for variables and values. The compiler will then emit the ap­pro­pri­ate in­struc­tions to access 1, 2, 4 or 8-byte values, based on the type. It also uses the type of a val­ue or vari­able, to allocate the correct amount of space. This is a compile-time op­er­a­tion; once compiled, there are no more types, just pure machine code, i.e.: statically typed.

Types & Pointers

Abstractions ease the cognitive process in programming environments, and a type is an abstraction — it makes a programmer's life easier. Different languages have dif­fer­ent levels, and kinds of, abstractions. Some higher level languages tend to hide the numeric na­ture of ad­dres­ses behind abstractions called references.

Pointers

In C, the abstraction of an address is elementary, and is called a pointer. Formally, we say that a pointer type is a derived type, simply because the word ‟point­er” by it­self, is just a concept. Application of the concept involves other, existing, types. This means, in a pro­gram, we can have ‘pointer to some type’, not just ‘pointer’. So saying ‘pointer’ is like saying ‘carnivore’ — it only categorises broadly, not qualifies un­am­bigu­ous­ly.

Definition 3: Pointer
The term pointer represents a classification for a group of derived types that ab­stract the con­cept: ‘the ad­dress of some type of value’.

Pointer Types

The pattern: T*, may be read as ‘T pointer’, where T can be any existing type — hence the as­ser­tion that pointers are derived types. Consequently, we then read, or pronounce, T** as ‘T pointer pointer’.

Some might prefer to pronounce it as ‘pointer to a pointer to a T’, which is slight­ly clos­er to its most verbose explanation: ‘the type of a value, that is the address of a value, which has the address of a T value’.

Since ‘T pointer pointer’ correlates with the read­ing di­rec­tion, maps one-to-one to the to­kens, and is the most succinct, we will persist with that pro­nun­ci­a­tion, except in one early example.

Definition 4: Pointer Type
The type T*, pronounced ‘T pointer’, or ‘pointer to T’, is a type that depicts the address of a T-type value, or in­formally: ‘means the address of a T value’.

Syntactically, whitespace, or the absence thereof, between the T and the *, or after the *, is not sig­nif­i­cant. However, we consciously join them together to help emphasise that it is a singular type, albeit a derived type.

Ex­am­ples of pointer types include the following: (we use the long­er de­scrip­tion in the com­ments, hoping that in the short term: ‘pointer to an int-type value’ might have more value than ‘int point­er’).

table: Pointer Type Descriptions
Type Description / Verbalisation
int* ‘pointer to an int-type value’, or: ‘int pointer’.
double* ‘pointer to a double-type value’, or: ‘double pointer’.
unsigned long long* ‘pointer to an unsigned long long-type value’, or:
unsigned long long pointer’
int** ‘pointer to an int*-type value’; same as:
‘pointer to a pointer to an int-type value’, or:
int pointer pointer’.

Since int, long, etc., are existing types, we could derive some pointer types from them. Whe­ther we use existing types, or derive some types, variables can be created with the type:

cDerived pointer types
int V;               //← `V` has type `int`.
int* P;              //← `P` has `int*` (‘int pointer’)
int** Q;             //← `Q` has type `int**` (‘int pointer pointer’)

Considering that int* is a type, albeit a derived type, we can in turn derive a pointer type from it: int**. And since that is also a type, we can derive another pointer type from it as well: int***, ad in­fini­tum.

Variables

The C programming language provides syntax to represent a value in memory. The rep­re­sen­ta­tion al­lows operators to access the value, either by reading it, or changing it. The most com­mon method is to use a named variable to represent a piece of memory. The size of the memory is de­ter­mined by the type of the variable. This premise holds for any other ex­pres­sions that also re­present memory.

Definition 5: Variables Represent Memory
Given:
Then:
T V = X; \longleftarrow variable called V, of type T, containing X.
In expressions, V represents the memory containing X, with type T.

REMINDER: A variable in C is a contiguous ‘chunk’ of memory, whose size is determined by its type, and is either re­pre­sen­ted by name, or referenced via a correctly typed pointer.

Lvalues & Rvalues

An expression, like a variable name, which represents memory, does not by itself affect the mem­o­ry; operators that utilise the expression, however, can read from, or write to, the mem­o­ry so re­pre­sented. In fact, operators that modify memory, require expressions that represent memory.

In C, we formally state that operators that write to memory, require lvalues, and those that read from memory, require rvalues. Lvalues can always be used as rvalues, but an rvalue, on the other hand, is not automatically an lvalue.

For example, a literal value 123 cannot be as­sign­ed to (it is not an lvalue), but its value can be used in an expression (it is an rvalue). Con­verse­ly, regarding V from the definition above, V can be assigned to, because it is an l­val­ue. V can be used as an r­val­ue (value is fetched). All variables, except array variables, are l­val­ues by default.

Constness

Normally, as we have seen, non-array variables are by default lvalues. We can override this, by pre­fix­ing a variable's definition with the const modifier. This give the variable the const storage class, which for practical purposes, makes it a read-only variable.

This sounds like a con­tra­dic­tion, but is not: It is handled and stored in exactly the same way as other variables, except that the compiler does its best to see that you do not create instructions to modify it. In other words, it checks that you do not use it as an lvalue. And yes, you can circumvent the compiler if you really want to.

Void Pointers

The closest we can get to an address as just a number, in C, is by using a void pointer. This has type void*, which fortuitously, we pronounce ‘void pointer’. It is simply a pointer type that very few op­e­ra­tors can work with, but the few operations that are legal, can be valuable. These in­clude: as­sign­ment, passing as argument, the result of function return, and casting to any other pointer type.

No arithmetic or indirection can be performed on void pointers. Some compilers may allow subtracting two void pointers, but that is undefined behaviour. They are most commonly used for functions that can work with any type of mem­o­ry, like memcpy(), and memset().

Writing a function taking a void pointer as parameter, means it is easy to call, since you can pass it any address without a cast. However, inside the function, the void point­er will have to be cast to another pointer type, before any­thing prac­ti­cal can be done with it.

Obtaining Pointers in C

C provides four ways in which a programmer can manifest values of a pointer type:

The last two are special cases; they have their own specialised rules, and even in­volve ex­cep­tions to established rules.

Pointer Type Categories

C provides three categories of pointer types, which affect syntax, and the operators which can be used:

The second occurs whenever we take the address of an array. The third is simply the type that all function names result in. As always, any T or type, can be a combination of other types, in­clud­ing more point­er types.

Pointer & Type Syntax Complexity

The following is entirely legal, but not considered recreational reading, and should probably be skipped at first. It is shown here simply to highlight the deficiencies in the syntax cho­sen for types, which are fine at a simple level, but not some much when they are combined.

cGibberish
double*(*(*X[3])(int(*)(void)))(long);      //←define `X`.
double d = *X[i](f)(123L);                  //←use `X`, store result.

X is an array of 3 elements, which are pointers-to-functions, each taking a pa­ra­me­ter of (point­er to func­tion, taking void pa­ra­meters, re­turn­ing int), re­turn­ing a pointer to a func­tion taking a long as pa­ra­me­ter, re­turn­ing a double*.

The explanation from cdecl.org is as follows: “declare X as array 3 of pointer to function (pointer to function (void) returning int) returning pointer to function (long) returning pointer to double”, while ChatGPT-4 has this explanation:

X is an array of 3 pointers to functions. Each function takes one argument — a pointer to a func­tion that takes no ar­gu­ments and re­turns int. These func­tions return a point­er to anoth­er func­tion that takes a long as ar­gu­ment and re­turns a point­er to double. Array of Function Pointers.

ChatGTP-4 May 24 2023

Of course, some functions must still be defined, and the actual elements of X given values, but it is workable, and conceptually, not so complex — the type syntax is the problem:

Pure poetry of the highest order, but complex enough that the use of such code should be discouraged (at least when not using typedef to simplify the syntax).

Address-Of Operator

The most obvious way to obtain a pointer value, is by using the address-of operator: & (am­per­sand). It is a prefixed unary operator. Its operand can be any expression that has an ad­dress. All lvalues have addresses; rvalues don't.

Even though an array expression is not an lvalue, it nev­er­the­less has an ad­dress. An array's el­e­ments are lvalues, unless they are also arrays. When defined as const, the elements are read-only. Lit­er­als, on the other hand, or the results of most operators, cannot be used as op­er­ands to the ad­dress-of operator.

Definition 6: Address-Of Operator
Given:
and:
Then:
and:
Also:
T V;   \longleftarrow variable named V of type T.
T A[N];   \longleftarrow array of N values of type T.
&V t T*   r   ‘address of V has type T pointer’
&A t T(*)[N]   r   ‘address of A t pointer-to-array of N values of T’.
&A[0] t T*   \longleftarrow   address of first element.

Take note that &A (address of array) is not equivalent to &A[0] (address of first element) — they do not have the same type, even though they have the same value. Also, one can take the ad­dress of a const variable. Given: const T V = Ec;, then the result of &V will be: const T* (or T const * if you like), instead of T* as above.

Definition 7: Pointer Expression
Any arbitrarily complex expression, that produces a value of a pointer type, is called a pointer expression. This includes string literals and const pointers.

Since int* means ‘address of an int’, it seems reasonable that taking the address of an int vari­able with the address-of operator, should produce an int*; and it does, which is why the as­sign­ment to P below, is valid. The same goes for the assignment to Q, which is a variable that can store an int** val­ue, and because P has type int*, taking its address, yields an int** type value, it can be stored in Q.

cAddresses of variables are pointers
  int V = 123;  //← variable `V`, storing `123`, of type `int`.
  int* P = &V;  //← variable `P`, storing `ADR_OF_V`, of type `int*`.
  int** Q = &P; //← variable `Q`, storing `ADD_OF_P`, of type `int**`.

In the illustration below, the variable P occupies 4 bytes, and by coincidence, so does the vari­able V — they are not necessarily the same size on all architectures. A lit­tle-en­di­an ar­chi­tec­ture stores the low byte of a value, in low memory, so the hexadecimal value in P is 0x0000014A, and that of V, is 0x00003039. In a big endian architecture, the bytes in memory would have been re­versed, but the values would remain the same.

figure: Pointer Representation in Memory

The actual addresses of the variables, i.e., the results of the address-of operators, depend on the com­pi­ler, architecture, compiler options, memory model, and numerous other fac­tors, maybe even the time of day, or phase of the moon. In C, we rarely care about the ac­tu­al nu­mer­i­cal value of an ad­dress — just that it is a valid numerical value, and in this case, will be the correct address for the first byte of the variable.

Printing Pointers

The only portable way to print the value of a pointer with printf, requires the %p formatting se­quen­ce. It expects a value of type void* though, so in order to avoid warnings, cast the pointer ex­pres­sion passed as argument, to void*

   printf("&I = %p\n", (void*)&I);

Remember that ‘pointer expression’ means: an arbitrarily complex expression, resulting in a final value having a pointer type.

Pointer Variable Definitions

Since C and C++ allows multiple variables to be de­fined in the same state­ment, some con­fu­sion is pos­sible when de­fining mul­ti­ple pointer vari­ables like this. The as­te­risk un­for­tu­nate­ly binds to the identifiers of the vari­ables, and not to the type.

cMultiple variable definitions in same statement
int i, * p = &i;         //←`i` -t→ `int`, `p` -t→ `int*`.
int* p, i;               //←`i` -t→ `int`, `p` -t→ `int*`.
int i, *p = &i, *q = &i; //←`i` -t→ `int`, `p` & `q` -t→ `int*`.
int* p, i, * q = &i;     //←`i` -t→ `int`, `p` & `q` -t→ `int*`.

Spaces around the asterisk are immaterial and of no se­man­tic value. Where i is de­fined be­fore p or q, its ad­dress can be used to ini­tia­lise the point­er variables.

This strange situation will not arise when creating a type alias for a pointer type. An alias is always a single name, and will apply to all variables when used in a mul­ti­ple vari­able de­fi­ni­tion state­ment.

cMultiple variable definitions with type alias
typedef int* IP;         //←`IP` ≡ `int*`.
IP i, p;                 //←`i` & `p` -t→ `int*`.
IP p, i;                 //←`i` & `p` -t→ `int*`.
IP i, p, q;              //←`i` & `p` & `q` -t→ `int*`.
IP p, i, q;              //←`i` & `p` & `q` -t→ `int*`.

A type alias created with typedef, is not a new type. It is simply a sy­no­nym for an ex­ist­ing type. Do not create type sy­no­nyms with macros, espec­ial­ly not for pointer types.

Array Variables and Expressions

The C array derived type, has the form: T[N], where T can be any non-abstract type (cannot be void or a function type); and N is a constant expression (const-expr), representing the number of elements, or count. When using an array type to define variables, the type must surround the variable: T V [N];. Spacing the between V and [N], or around N, is immaterial.

Array variables are somewhat special — any expression representing an array, by array vari­able name, or other means, will result in the ad­dress of the first element. As a con­se­quen­ce, an array nev­er represents the complete collection of values that it contains. The result is that arrays can­not be moved around as a whole (cannot be assigned, re­turned from func­tions, or passed as ar­gu­ments). It is not as limiting as it sounds — pointer operations are powerful.

Definition 8: Array Variables in Expressions
Given:
Then:
Thus:
Note:
T A[N];   \longleftarrow array variable A, containing N values of type T;  
A   \equiv   &A[0]   \longleftarrow decays to the address of the first element.
A  t  T[N]  becomes  T*.
T[N]   \longleftarrowused for memory allocation only; becomes T* in expressions.

The type of A is still T[N], meaning A is an array, as evidenced by taking its size: ‘sizeof A’, which will be equal to sizeof(T[N]). When represented in an expression, except as operand to the sizeof or cast op­e­ra­tors, it results in, or decays to, a pointer, in this case: T*.

It is important to understand the ‘T[N]means ‘an array of N elements of type T’, which imparts more information than, for example: ‘T*’, which means ‘address of a T value’. The implicit conversion from ‘T[N]’ to ‘T*’ represents a loss of information, hence the term: decay.

The first element of an array A, can be ex­pres­sed in code as A[0]. It re­pre­sents the first T val­ue. The ad­dress of the first el­e­ment, can thus be ex­pres­sed as: &A[0]. In other words then, as an ex­pres­sion, A is equi­va­lent to &(A[0]), yield­ing a T* value, where T is the type of the first el­e­ment of A. Due to op­e­ra­tor pre­ce­dence, we can short­en the rule to: A \equiv &A[0]. Both will be le­gal ex­pres­sions, but it would never be ne­ces­sa­ry to write &A[0], and be­cause of point­er arith­me­tic, and sub­script op­e­ra­tor rules, the following equivalences are all valid:

A   ≡   &A[0]   ≡   &0[A]   ≡   A + 0   ≡   0 + A

If the subscript operator is overloaded in C++, &0[A] will not be equivalent, and if addition is overloaded on the type of A, the commutativity of addition will not necessary hold.

cArray variables decaying to pointers
int A[3] = { 11, 22, 33 };    //←initialised sequence of `int` values.
int* P = A;                   //←`P` = address of first element of `A`.
int* Q = &A[0];               //←redundant, since `A` ≡ `&A[0]`
if (A != &A[0])
   printf("It's the end of days!\n");
if (sizeof A != sizeof(int[3]))
   printf("Life as we know it, has ceased.\n");

In short: The type of the variable A above is int[3]; the type of the expression A is int*.

The same operations applicable to A, above, are also applicable, and work exactly the same, when ap­plied to P, since both A and P have the same type and value in an expression. They are not re­mote­ly the same kind of variable: the one is an array variable, the other is a point­er vari­able, by clas­si­fi­ca­tion.

Operators never care where the value of their operands orig­i­nat­ed: whether it was a literal, a variable, or the result of a previous operator. They only care about the operand types.

Sizeof Operator and Arrays

The sizeof operator does not treat the expression A, as a pointer. It is an exception, and on­ly be­cause the definition of A is within scope. It is fortuitous and for our convenience. For ex­amp­le, the ex­pres­sion: ‘sizeof A / sizeof A[0]’, or ‘sizeof A / sizeof *A’, will result in the number of elements in the array. It is often wrapped in a macro:

cArray size macro
#define ARRAY_SIZE(arr) (sizeof(arr)/sizeof(*(arr)))
···
   int total = 0, data[] = { 11, 22, 33 };
   for (int i = 0; i < ARRAY_SIZE(data); ++i)
      total += data[i];
   printf("Sum of data = %d\n", total);

Because we did not specify the size of the array called data, we can easily add more ini­tia­li­sers, or re­move some, without worrying about maintaining a macro with the size of the array, which is what of­ten happens.

Function Pointers

Although function types are not data types, the name of a function results in the address of the func­tion in the code, which is a data type. This is called a function pointer, or verbosely, pointer to a func­tion. Since we have defined the term pointer to be synonymous with an as­so­ci­at­ed type, it follows that ‘function pointer’ is only a classification; it be­comes con­crete when it has an actual type.

Definition 9: Function Pointer
Given:
Then:

Thus:
T F (P)   \longleftarrow function F, taking P parameters, returning T.
F \equiv &F \equiv *F  t T(*)(P)
t ‘pointer to function taking P parameters, returning a T’.
Functions represented in an expression, result in function pointers.

The only legitimate way, therefore, to obtain a function pointer in C, is to use the name of a func­tion. Other languages, like C++11, allow for anonymous functions as expressions. These ex­pres­sions are for­mal­ly called lambdas, so you might hear that some language supports, or does not sup­port, lamb­das. Well, C does not support lambdas, which is a shame, but only a minor in­con­ven­ience.

The names of the parameters are not relevant in the function pointer type. Point­er arith­metic can­not be performed on function pointers.

fpdemo01.cFunction Pointer Example 01
/*!@file  fpdemo01.c
*  @brief Function Pointer Demonstration 01
*/
#include <stdio.h>

// divide an `int` by `10` with rounding, returning a `long`.
//
long F (int p) {
   return ((long)p * 10L + 50L) / 100L;
   }

// divide an `int` by `10` without rounding, returning a `long`.
//
long G (int p) {
   return (long)p / 10L;
   }

int main (void) {

   long (*P)(int);

   P = F;   printf("%ld\n", P(95));   //← `P(95)` calls `F(95)`.
   P = G;   printf("%ld\n", P(95));   //← `P(95)` calls `G(95)`.

   return EXIT_SUCCESS;
   }

The simplistic example above shows that the variable P can store any value, as long as the value has type long(*)(int), i.e., ‘pointer to function taking one int parameter, returning a long’. That is the type of F and G in expressions, so we can assign either to P.

Code involving function pointers is significantly simplified by using a typedef. It creates type aliases. Here is the same pro­gram, with the addition of two user-defined type aliases, called FT, which is an alias for long(int), and FTP, which is simp­ly an alias for long(*)(int):

fpdemo02.cFunction Pointer Example 02
/*!@file  fpdemo02.c
*  @brief Function Pointer Demonstration 02
*/
#include <stdio.h>

typedef long FT(int);       //←`FT*` ≡ `long(*)(int)`.
typedef long (*FTP)(int);   //←`FTP` ≡ `long(*)(int)`.

// divide an `int` by `10` with rounding, returning a `long`.
//
long F (int p) {
   return ((long)p * 10L + 50L) / 100L;
   }

// divide an `int` by `10` without rounding, returning a `long`.
//
long G (int p) {
   return (long)p / 10L;
   }

int main (void) {

   FTP P;
   FT*  Q;

   Q = P = F;   printf("%ld,%ld\n", P(95), Q(95)); //←calls `F(95)`.
   Q = P = G;   printf("%ld,%ld\n", P(95), Q(95)); //←calls `G(95)`.

   return EXIT_SUCCESS;
   }

Note that we could have create the FTP alias in terms of FT: typedef FT* FTP;

Using the FTP type, it becomes trivial to pass function pointers, return function pointers, or store them in arrays:

cUsing pointer-to-function aliases
typedef long FT(int);        //← `FT*` ≡ `lont(*)(int)` (1)
typedef long (*FTP)(int);    //← `FTP` ≡ `long(*)(int)` (2)
···
// function returning a function pointer.
FT* FRP (void);                       //← using typedef (1)
FTP FRP (void);                       //← using typedef (2).
long (*FRP(void))(void);              //← not using typedef.

// function taking a function pointer as argument.
void FAP (FT* parm);                  //← using typedef (1).
void FAP (FTP parm);                  //← using typedef (2).
void FAP (long (*parm)(int));         //← not using typedef.

// array of function pointers.
FT* AFP[2] = { F, G };                //← using typedef (1).
FTP AFP[2] = { F, G };                //← using typedef (2).
long (*AFP[2])(int) = { F, G };       //← not using typedef.

// function returning, and accepting a function pointer.
FT* FFF (FT* parm);                   //← using typedef (1).
FTP FFF (FTP parm);                   //← using typedef (2).
long (*FFF(long (*parm)(int))(int);   //← ouch.

Like all operators, the function call operator has no idea from where its function pointer op­er­and orig­i­nat­es. It simply does its job, regardless. Some programmers, with convoluted ra­tio­nal­i­sa­tions, will write: P = &F; instead of P = F;, as if they are somehow different. The com­pi­ler will allow this syntax, but it is ignored. Similarly, they write: (*P)(95), instead of P(95), as if it's special. Again: it is al­lowed, but the com­pi­ler completely ignores it. In fact, you can even write: (*F)(95), and it will also be ignored, and be treated as F(95).

If you're not convinced, consider what set of rules, other than mentioned, explain why this will com­pile, and call F(), every time:

cIrrelevant indirection on pointer-to-functions
long F (int parm) { ··· }
long (*P)(int) = F;                  //← or `… = &F;` if you like.
···
   (*****F)(123);   P(123);          //← calls `F()`.
   (***F)(123);  (*P)(123);          //← calls `F()`.
   (*F)(123);  (***P)(123);          //← calls `F()`.
   F(123);   (*****P)(123);          //← calls `F()`.

The function call operator gets the same value, and the same type for any of these ex­pres­sions, and will perform the same task on each — call F().

If you like to see even more pathologically insane variants of the above, consider the fol­low­ing, which assumes that the definitions of F and P above are still in scope:

cIrrelevance to the extreme
   (****(***(**(*F))))(123);        //← still calls `F()`.
   (****(***(**(*P))))(123);        //← still calls `F()`.
   (&*&*&*&*&*&*&*&F)(123);         //← still calls `F()`.
   (&*&*&*&*&*&*&*&P)(123);         //← still calls `F()`.

It does not matter how you represent a function, by name or pointer variable, it will always just be a function pointer — which can only be called with the function call operator, stored in a variable, passed as argument, or returned from a function; whether you unnecessarily ap­ply in­di­rec­tion to it or not.

Literal Strings

Literal strings (e.g. "ABC" or L"ABC") decays to pointers to their first characters. Since each char­ac­ter has type char, or wchar_t, that means the type is char* or wchar_t* respectively. In C++, they de­cay to have types const char* and const wchar_t* respectively. The fact that they are not const point­ers in C, does not mean it is portable to write to the string location.

Technically, a literal string results firstly in an array of char or wchar_t. Then number of elements will be the count of characters plus 1 for a terminal NUL character (value 0):

"ABCDE"   t   char[6]   decays   char*     \longleftarrow C
"ABCDE"   t   const char[6]   decays   const char*     \longleftarrow C++

cLiteral string types and their decays
printf("%zu\n", sizeof "ABCDE"));      //⇒6
printf("%zu\n", sizeof(char[6]));      //⇒6

Fur­ther­more, duplicate & iden­ti­cal literal strings are allowed to share the same space. Al­though it has no real impact on pro­grams, a literal string can be considered an array of chars or wchar_ts, but since representing it will result in a pointer, it is rather a moot point.

The important point here is that, conceptually, programmers tend not to think of a literal string as a pointer. They think of it as a ‘string’, which it is, and which it isn't, depending on your per­spec­tive. But regardless of perspective or assumption, it results in a pointer, and hence all op­e­ra­tors that work with pointers, will work with literal strings.

cLiteral string shenanigans
   putchar(     *"ABCDEF"     ); //← indirection,                  ⇒ `A`
   putchar(      "ABCDEF"[2]  ); //← subscript (indirection),      ⇒ `C`
   putchar(    2["ABCDEF"]    ); //← subscript (indirection),      ⇒ `C`
   putchar(    *("ABCDEF" + 2)); //← ptr arithmetic & indirection. ⇒ `C`
   putchar(*(2 + "ABCDEF")    ); //← ptr arithmetic & indirection: ⇒ `C`
   char* P  =    "ABCDEF";       //← store pointer in `P` variable.
   char S[] =    "ABCDEF";       //← exception. not a literal string:
   char T[] = {'A','B','C','D','E','F','\0'};  //←equivalent to this.

We have not yet discussed the intricacies of indirection, pointer arithmetic, and the sub­script op­er­a­tor, so do not get too concerned about that. The code is only to prove that literal strings result in pointers — in other words, the code is syntactically correct, will compile, run, and pro­duce the same results, on all con­form­ant C compilers.

In C++, the type of a literal string is: const char[N], decaying to: const char*. You should also treat literals strings in C as if they are const pointers, as matter of good programming convention. In other words, instead of:

char* ident = "ABC";.

rather use:

const char* ident = "ABC";

Writing to the memory occupied by the characters of a string literal is undefined behaviour.

Operations on Pointers

Pointer values are often passed to functions as arguments, so that functions may have the opt­ion to modify the value at that address, without the function needing to know the name of that value. In some other languages, the compiler may do it automatically, and then it is called ‘pass by ref­er­ence’. C has no syntax for this feature; we must manually pass addresses, should we need to.

Indirection Operator

In order to represent the value a pointer ‘points to’, we can use the indirection operator. Any ex­pres­sion, where the last operator to be performed is the indirection operator, is called an ‘in­di­rec­tion ex­pres­sion’.

Definition 10: Indirection Operator
Given:
Then:
Means:
Type:
E t T*   \longleftarrow any expression E of type T*.
*E r ‘indirect E
represent the T value at address E’.
*E t T

Assuming the runtime value of the expression E above is ADR_OF_V, which is an address, we of­ten say ‘E points to ADR_OF_V’. Generally however, we know ADR_OF_V is the address of some variable, say V, in which case we will say: ‘E points to V’. Hence the term ‘pointer’.

Since we have defined a variable name to represent a value in memory, and we have also de­fin­ed an indirection expression, like *E above, to represent a value in memory, it means that the same op-er­a­tions can be applied to either. Given the right values, it will be possible for an in­di­rec­tion to be a practical alias, in almost every respect, for some variable:

mini_indirect.cMinimal Indirection Example
#include <stdio.h>
int main (void) {
   int V = 123;                   //←value 123. assume `&V` is ADR_V.
   int* P = &V;                   //←value ADR_V. assume `&P` is ADR_P.
   int** Q = &P;                  //←value ADR_P. assume `&Q` is ADR_Q.

   #define _p(expr) printf("%*s = %p\n", -5, #expr, ((void*)(expr)))
   _p( &V   );                    //⇒ &V   = ADR_V 
   _p( &P   );                    //⇒ &P   = ADR_P 
   _p( &Q   );                    //⇒ &Q   = ADR_Q 
   _p( &*P  );                    //⇒ &*P  = ADR_V
   _p( &*Q  );                    //⇒ &*Q  = ADR_P
   _p( &**Q );                    //⇒ &**Q = ADR_V
   _p( P    );                    //⇒ P    = ADR_V
   #undef _p

   #define _p(expr) printf("%*s = %d\n", -5, #expr, expr)
   _p( V    );                    //⇒ V    = 123
   _p( *P   );                    //⇒ *P   = 123
   _p( **Q  );                    //⇒ **Q  = 123
   #undef _p
   }

From the C precedence table, notice that the address-of and indirection operators have the same level of precedence, but they associate with their operands from right to left. Con­se­quent­ly, in the ex­pres­sion &*P, in­di­rec­tion is performed first, which represents memory, and is exactly what the ad­dress-of op­e­ra­tor re­quires: it can only take the address of an expression that re­pre­sents memory.

Empirically, one can see by the output that *P is effectively (not entirely), an alias for V, just like *Q is effectively an alias for P, and **Q consequently also an alias for V. That supports the above definitions, so it should be no sur­prise. This will remain true for as along as P con­tains the address of V. If P is modified to con­tain the ad­dress of another vari­able, say W of type int, then from that point on, *P will be an alias for W.

The example above is not practical code, since there is no point in taking the address of a vari­able, and store it in another variable in the same scope. This is not illegal though, so we used it to illustrate the fun­da­men­tals of in­di­rec­tion. In practical programs, the most common reason for taking an ad­dress, is to pass it to a func­tion, so that the function may modify the variable via indirection.

cFunction with pointer type parameter
extern void tripple_it (int* parm);  //←declaration.
···
   int var1 = 12, var2 = 20;
   printf("var1 = %d\n", var1);      //⇒ 12
   tripple_it (&var1);
   printf("var1 = %d\n", var1);      //⇒ 36
   printf("var2 = %d\n", var2);      //⇒ 20
   tripple_it (&var2);
   printf("var2 = %d\n", var2);      //⇒ 60
···
void tripple_it (int* parm) {
   *parm *= 3; //← `*parm` is an ‘alias’ for value at the address in `parm`.
   }

The same machine code in tripple_it() can now modify any variable whose address has been passed, by using the indirection operator, without ever knowing the names of the vari­ables, or em­ploy­ing vari­ables defined on the external level.

Another reason we pass pointers to functions, is when we have no choice. This is when, con­cep­tu­al­ly, we want to pass an array. We say conceptually, because we have now established that an array can­not be used as a complete unit, or ‘chunk’ of memory.

Returning Pointers

Functions can return pointers, though this should be used with care: for example, a function should never return a pointer to a local variable, or parameter. If a parameter is a pointer, it will be safe to return the value of the parameter. The lifetime and ownership of the value referenced by the pointer returned, should be well documented.

cFunction returning a pointer
long* frp (void) {
   static long global_lifetime_var = 0L;
   return &global_lifetime_var;
   }
···
   long l = *frp();
   printf("1) *frp() = l = %ld" "L %ld" "L\n", *frp(), l);
   l = *frp() = 123L;
   printf("2) *frp() = l = %ld" "L %ld" "L\n", *frp(), l);
   l = frp()[0] = 234L;
   printf("3) *frp() = l = %ld" "L %ld" "L\n", *frp(), l);

   long* p = &*frp();
   *p = 345L;
   printf("4) *p = *frp() = %ld" "L %ld" "L\n", *p, *frp());
1) *frp() = l = 0L 0L
2) *frp() = l = 123L 123L
3) *frp() = l = 234L 234L
4) *p = *frp() = 345L 345L

It is safe for frp to return the address of static local variable, since it has a global lifetime.

Const Pointers

The problem now is that we may want to pass the array so that the function can read from it; we don't want the function to write to the array. Of course, we can hope the function will not mo­di­fy it. But better to be safe than sorry, which brings us to the concept of ‘read-on­ly point­ers’ or, more for­mal­ly: ‘const pointers’.

Definition 11: Const Pointers
Given:
Same:
Then:
const T* P; \longleftarrow pointer to a value of type const T.
T const* P; \longleftarrow pointer to a value of type const T.
*P t const T \longleftarrow means it can only be read.

The following definition of P has the same effect: T const *P;, but most programmers use the first ver­sion in the definition above. Here, this pattern is used in an example:

cIteration using const pointer parameters
#define ARRSZ(a) (sizeof(a)/sizeof(*(a)))
extern int sum (const int* beg, const int* end);
···
   int data[] = { 11, 22, 33, 44 };
   int total = sum(data, data + ARRSZ(data));
   printf("Sum of data = %d\n", total);
···
int sum (const int* beg, const int* end) {
   assert(beg && end && beg < end); 
   int result = *beg++;
   while (beg != end)
      result += *beg++;
   return result;
   }

Of course, you could have used another algorithm and parameters for sum(), but you will still have at least one parameter of type const int*. We show such an example below, but also explain one more rule:

Array Parameter Optional Syntax

To aid readability, C allows one to optionally define pointer parameters as if they are arrays.

Definition 12: Array Syntax for Parameters
When, and only when, a ‹param›eter of a function is declared or defined, then:
T*param›’   \equiv   ‘Tparam[]’, and
const T*param›’   \equiv   ‘const Tparam[]’.
T (*param)[N]’   \equiv   ‘Tparam[][N]’. Also
const T (*param)[N]’   \equiv   ‘const Tparam[][N]’.
Constant integer expressions between the square brackets, are allowed, but ignored.
The N for the pointer-to-array type, is required however (part of type).

This rule exists solely so that programmers may convey more meaning: ‘abstractly, expecting an array’. It does not change any be­ha­viour. If a function is expecting an ar­ray, seeing that a parameter is of type T [], is more mean­ing­ful to a reader than seeing it is of type T*, which could just as well be the ad­dress of a single val­ue.

Any constant expression between the square brackets have absolutely no meaning — it is simply ignored by the compiler. It does not even have to match the existence, or not, of such a value in the definition of the function and its declaration.

It is considered a good coding convention to use this rule, where ap­plic­able:

cArray parameter is a pointer
#define ARRSZ(a) (sizeof(a)/sizeof(*(a)))
extern int sum (const int arr[], size_t count);
···
   int data[] = { 11, 22, 33, 44 };
   int total = sum(data, ARRSZ(data));
   printf("Sum of data = %d\n", total);
···
int sum (const int arr[], size_t count) {
   int result = 0;
   for (size_t i = 0; i < count; ++i)
      result += arr[i];
   return result;
   }

If, for some reason, you wanted a pointer variable or parameter, to also be const, not just what it is pointing to, you can use const twice:

   int const * const p = some_value;

Now both p and *p result in const types, and that is why p has to be initialised with some value — last chance you'll get. It could also have been writ­ten as follows, with the same semantics:

   const int* const p = some_value;

If only p has to be const, and not what it points to, i.e. *p:

   int* const p = some_value;

In the last case, p must be initialised, otherwise it will result in a compilation error.

Passing Pointers for Speed

The only other reason we pass pointers to functions, is when passing by value would be too ex­pen­sive, as may be the case for large structure type values. Passing an address could be much faster. Again, we should make it a const pointer, if we do not want the func­tion to mo­di­fy the mem­bers of the struct.

cStruct pointer as parameter
typedef struct ST {
   int member;
   // ··· additional members ···
   } ST;

void F (const ST* parm) {
   printf("(*parm).member = %d\n", (*parm).member);
   printf("  parm->member = %d\n",   parm->member);
   }
···
   ST V = { 123 };
   F(&V);               //⇒ … = 123 … = 123

We have to parenthesise the (*parm) expression, which represents the actual value parm points to, since the indirection operator (*) has lower precedence than the mem­ber se­lec­tion op­e­ra­tor (.). The member se­lec­tion operation will not work on pointers, hence the need to represent the struct first, then we can select a member from the representation.

Indirect member selection, like subscript, is a shortcut operator. The ex­pres­sion S->M is preferred, but is synonymous with (*S).M. It is therefore also an in­di­rec­tion ex­pres­sion, and if S is an lvalue, so is S->M, assuming M complies with lvalue rules (not an array).

Pointer Arithmetic

It is possible to add or subtract (+/-) an integer type value to, or from, a pointer type value. This is a special case, and is called pointer arithmetic. In this ‘arithmetic’, 1+1 will not ne­ces­sa­ri­ly be equal to 2, nor will 2-1 necessarily be equal to 1.

Definition 13: Pointer Arithmetic
Given:
Then:
Value:
E t T*,  and  I  t any integer type.
E + I  \equiv  I + E,  E - I  t T*  \longleftarrow commutativity1 of + holds.
E ± I  * sizeof(T)  r ‘value of E plus-or-minus I times sizeof(T)

From the definition, we can see that - or + will not change the type of the pointer expression oper­and, so that the result will still be T*.

Conceptually, this means that a T*, can only point to T values, so when incremented, for ex­amp­le, it can only point to the next T value. If two T values are adjacent in memory, and E results in the ad­dress of the first, then E+1, will give us the next T, i.e., the address of the sec­ond T value, re­gard­less of the size of a T value. This can be extended to any se­quen­ce, so that E+5, for ex­amp­le, will re­sult in the address of the sixth T. Of course, although not ne­ces­sa­ry, E+0 is legal, and equal to E.

Pointer arithmetic exposes one of the biggest dangers in C. By using pointer arithmetic, any ad­dress can effectively be reached, whether that address is valid, or whether that address actually con­tains a value of type T, or not. The compiler cannot help us to stay within the bounds of our se­quen­ce of T values — it is just performing arithmetic. In the example below, we com­bine point­er arith­me­tic and in­di­rec­tion, which allows us to access elements of the ar­ray, using a point­er to the first element, stored in P, and an offset.

cPointer arithmetic (commutativity of addition)
   int* P = (int[3]){ 11, 22, 33 };        //← ptr. to seq. of 3 `int`s.
   printf("*(P + 0) = %d\n", *(P + 0));    //⇒ 11
   printf("*(P + 1) = %d\n", *(P + 1));    //⇒ 22
   printf("*(P + 2) = %d\n", *(P + 2));    //⇒ 33
   printf("*(0 + P) = %d\n", *(0 + P));    //⇒ 11
   printf("*(1 + P) = %d\n", *(1 + P));    //⇒ 22
   printf("*(2 + P) = %d\n", *(2 + P));    //⇒ 33

The first statement uses a C99 ‘compound literal’ to create a static, unnamed array of 3 int val­ues, and assign the address of the first element to P. An array compound literal acts just like an array re­pre­sented by name — it results in the address of the first element.

Arrays of Arrays

An element of an array could be another array, also known as ‘an array of arrays’, more com­mon­ly mis­re­pre­sented as a ‘multi-dimensional array’, leading to all kinds of mis­con­cep­tions. Talking of a multi-dimensional array, is useful only as an algorithmic abstraction — it does not represent C se­man­tics, which only involves types, and pointer arithmetic.

In the example below, you can easily replace ROW with T, and see how it compares with the de­f­i­n­i­tions above. Ultimately, M + 1 must result, as we have seen, in the address of the next ROW type (2nd element). For that to work, the compiler uses the type to calculate that offset. And the type of the el­e­ment is ROW, which is a synonym for int[3] (array of 3 ints). Thus, M + 1 gets calculated as:

M + 1 * sizeof(ROW), or, since ROW is a synonym for int[3], as:
M + 1 * sizeof(int[3]), which provides the correct address.
cArray-of-arrays with type alias
/* typedef version
*/ {
   typedef int ROW[3];        //←`ROW` is synonym for `int[3]`.
   ROW M[4] = {               //←`M` stores 4 `ROW` values.
      { 11, 12, 13 },         //←values for `M[0]` ‘first ROW’.
      { 21, 22, 23 },         //←values for `M[1]` ‘second ROW’.
      { 31, 32, 33 },         //←values for `M[2]` ‘third ROW’.
      { 41, 42, 43 }};        //←values for `M[3]` ‘fourth ROW’.

   ROW*  P     =  M;          //←`M` === `&M[0]`
   ROW (*Q)[4] = &M;          //←`&M` is ‘ptr-to-array of 4 `ROW`s’.

   printf("M    %p\n", (void*) M);
   printf("&M   %p\n", (void*) M);
   printf("P    %p\n", (void*) P);
   printf("&P   %p\n", (void*) &P);

   printf("%d %d\n", *(*(M + 1) + 2)       , M[1][2]   );  //⇒ 23 23
   printf("%d %d\n", *(*(P + 1) + 2)       , P[1][2]   );  //⇒ 23 23
   printf("%d %d\n", *(*(*(Q + 0) + 1) + 2), Q[0][1][2]);  //⇒ 23 23
   }

Without using the ROW user-defined typedef, no operators need to change, only the syntax for the de­f­i­n­i­tion of the M variable. We add one additional twist: taking the address of Q, which will result in a int(**)[4][3] type.

cArray-of-arrays without type alias
/* fundamental type version
*/ {
   int M[4][3] = {
      { 11, 12, 13 },        //←values for `M[0]` ‘first int[3]’.
      { 21, 22, 23 },        //←values for `M[1]` ‘second int[3]’.
      { 31, 32, 33 },        //←values for `M[2]` ‘third int[3]’.
      { 41, 42, 43 }};       //←values for `M[3]` ‘fourth int[3]’.

   int (*P)[3] = M;          //←`M` === `&M[0]`
   int (*Q)[4][3] = &M;      //←`&M` is ‘ptr-to-array of `int[4][3]`s’.
   int (**R)[4][3] = &Q;     //←`*R` ≡ `Q`.

   printf(" M   %p\n", (void*)  M );
   printf("&M   %p\n", (void*) &M );
   printf(" P   %p\n", (void*)  P );
   printf("&P   %p\n", (void*) &P );
   printf(" Q   %p\n", (void*)  Q );
   printf("&Q   %p\n", (void*) &Q );
   printf(" R   %p\n", (void*)  R );
   printf(" R   %p\n", (void*) *R );
   printf("&R   %p\n", (void*) &R );

   printf("%d %d\n", *(*(M + 1) + 2)       , M[1][2]   );    //⇒23 23
   printf("%d %d\n", *(*(P + 1) + 2)       , P[1][2]   );    //⇒23 23
   printf("%d %d\n", *(*(*(Q + 0) + 1) + 2), Q[0][1][2]);    //⇒23 23

   printf("%d %d\n",*(*(*(*R + 0) + 1) + 2), R[0][0][1][2]); //⇒23 23
   printf("%d %d\n",*(*(*(*R + 0) + 1) + 2),(*R)[0][1][2]);  //⇒23 23

   printf("%d %d %d\n", **P, ***Q, ****R); //⇒11 11 11
   }

Take care to understand that A[0][0]**A, and P[0][0]**P, and Q[0][0][0]***Q. Take A[0][0] as an example, which will result in: *(*(A+0)+0) and is calculated as follows:

*(*(A + 0 * sizeof(int[3])) + 0 * sizeof(int))  =  *(*(A + 0) + 0)  =  **A.

As arrays of arrays can become large quite quickly, and because the stack can be very lim­it­ed in some environments, it is often more convenient to allocate the memory at run­time (dy­nam­i­cal­ly), using the standard library, or a custom library. Here is a program similar to the above ex­am­ples, but for em­ploy­ing dynamic memory:

cDynamically allocated array-of-arrays
   int (*M)[3] = (int(*)[3])malloc(4 * 3 * sizeof(int));
                   // or: `…malloc(4 * sizeof(int[3]));`.
   if (!M) { // malloc returns null pointer on failure.
      fprintf(stderr, "No memory.");
      exit(EXIT_FAILURE);
      }
   M[0][0] = 11; M[0][1] = 12; M[0][2] = 13;
   M[1][0] = 21; M[1][1] = 22; M[1][2] = 23;
   M[2][0] = 31; M[2][1] = 32; M[2][2] = 33;
   M[3][0] = 41; M[3][1] = 42; M[3][2] = 43;

   printf("M[1][2] = %d\n", M[1][2]);          //←tidy expression, but is
   printf("        = %d\n", *(*(M + 1) + 2));  // calculated like this.
   ···
   free(M); //← important.

Since C does not store metadata anywhere for arrays, it follows that M, &M, and &M[0], will all pro­duce the same address. The only difference is in the type of the address &M produces, as op­posed to the type of M and &M[0], which in turn affects any pointer arithmetic applied to it.

One of the biggest problems with dynamic memory allocation, is to remember to free() the mem­o­ry once done with it. This is easy to forget, or to miss on a return path, and is called a ‘mem­o­ry leak’. Sim­i­lar­ly, one must check the return value of malloc() for a fail­ure to al­lo­cate mem­o­ry. To continue without error checking, is looking for trouble, and in­di­cates slop­py or lazy programming.

An alternative for C99, which does not use the stack, or dynamic memory, but rather static mem­o­ry (glob­al life­time), is to use a compound literal. The compiler simply creates an un­named array, with a global lifetime, and returns a pointer to the first element:

cC99 array-of-arrays compound literal
   int (*M)[3] = (int[4][3]){
      { 11, 12, 13 },
      { 21, 22, 23 },
      { 31, 32, 33 },
      { 41, 42, 43 }};
   // now we can use `M` algorithmically like a 2D array. the `1`
   // is the ‘row’ offset, and the `2` is the ‘column’ offset:
   printf("M{Row2,Col3} = %d\n", *(*(M + 1) + 2));    //⇒ `23`

All the arrays of arrays examples above use the same pointer arithmetic. The ex­po­si­tion be­low refers to any one of the above arrays of arrays examples, all referenced by M; since the types are the same, the same operators will produce the same results.

Instead of using the above C99/11 compound literal syntax, we could use the type int[4][3] with malloc(), then the code might looks as follows:

cDynamically allocated array-of-arrays
   int (*M)[3] = (int(*)[3])malloc(sizeof(int[4][3]));
   int i, j;
   M[i=0][j=0] = 11;  M[i][++j] = 12;  M[i][++j] = 13;
   M[i=1][j=0] = 21;  M[i][++j] = 22;  M[i][++j] = 23;
   M[i=2][j=0] = 31;  M[i][++j] = 32;  M[i][++j] = 33;
   M[i=3][j=0] = 41;  M[i][++j] = 42;  M[i][++j] = 43;
   ···
   free(M); //← release the dynamically allocated memory.

To place initial values in the M ‘array’, is now more cumbersome, explaining the addition of com­pound literals to the C language.

Arrays of Arrays Pointer Arithmetic

Since M, in an expression, is pointing to the first element, which is an array, the type is a point­er-to-array, which we write as: int(*)[3], or ROW*, if using the synonym.

So, *(M + 1) represents the 2nd row, but since the second row is an array, it must result in a pointer to the first element: int*.

Assuming the result of *(M + 1) == R, then (R + 2) is the address of the 3rd element, and thus *(R + 2), i.e., *(*(M + 1) + 2) represents the 3rd element: 23, of the second element of M.

The following example does not add much more, but does try to show that an int[2][3] array (like A below), will result in a pointer-to-array: int(*)[3]. Given a variable of that type, like Q, the same op­e­ra­tors will give the same result on both A and Q; they are different kinds of vari­ables, but they have the same type, and in the example, the same val­ue in an ex­pres­sion.

cMore pointer-to-arrays and arrays-of-arrays
   // ‘P is a ptr-to-array of 3 elements of type `int`’, and the compound
   // initialiser, whose result is assigned to `P`, is:
   // ‘an array of 2 elements, each being an array of 3 elements of `int`.’
   //
   int (*P)[3] = (int[2][3]){ { 11, 12, 13 }, { 21, 22, 23 } };

   // in an expression, “`A` *results* in a ptr-to-array of 3 elements,
   // of type `int`”. Or: a pointer to a `int[3]`, which we cannot write
   // as `int[3]*`, we must write it as `int(*)[3]`.
   //
   int A[2][3] = { { 11, 12, 13 }, { 21, 22, 23 } };

   int (*Q)[3] = A;          //← `A` has type `int(*)[3]` here.

   // all `printf`s below, output `23`.
   //
   printf("*(*(A + 1) + 2) = %d\n", *(*(A + 1) + 2) );
   printf("*(*(Q + 1) + 2) = %d\n", *(*(Q + 1) + 2) );

   // `A[1]`, for example, represents an array (the second element), so it
   // must result in a `int*`, because the first element of the second
   // array, is an `int`.

   int* L = *(A + 1);         //← all good.
   //int** M = A;             //← illegal. will not compile. wrong types.

Pointer arithmetic is at the core of all array operations. Fortunately, as shown later, C pro­vides the sub­script operator, which allows for more concise expressions.

Pointer Difference

As a matter of interest, pointers can be subtracted from each other. The result has the type ptrdiff_t, (from <stddef.h>), which is not an intrinsic type. It is ‘implementation-defined’, which means a compiler im­ple­ment­er can decide about the size, and therefore range, of the val­ue. Generally, it is a signed type. Not all possible differences may be legal, i.e., a result may be bigger than PTRDIFF_MAX (from <stdint.h>).

Caveat: pointer differences are only well-defined when pointers to different elements in the same array, or contiguous memory block dynamically allocated, are subtracted, and includes the pointer that is one past the end of the array. That also im­plies that the pointer operands must be of the same type.

The return value is in terms of units of type T, where T is from the pointers' types: T*.

Subscript Operator

The subscript operator, or index operator, is actually simply shorthand, or ‘syntactic sugar’. In fact, it is very superficial shorthand, since a subscript operation is simply phys­i­cal­ly re­arranged into a pointer arithmetic and indirection expression before types are checked, or machine code gen­er­at­ed. This is crucial to accept and understand, otherwise the following will not make sense.

Definition 14: Subscript Operator Pattern
Given:
Rewritten: Meaning:
X[Y]   \longleftarrow i.e., any pattern in this form, is…
*(X + Y)   \longleftarrow before type checking and compilation.
X[Y]   \equiv   *(X + Y)   \equiv   *(Y + X)   \equiv   Y[X]

This is transformed literally, so that X[Y] is equal to Y[X], just like *(X + Y) is equal to *(Y + X), which is what the first two patterns are translated to respectively, anyway2. This is only a prob­lem if you have pre­con­ceived ideas about the subscript operator. Since most C pro­gram­mers are not aware of this de­f­i­n­i­tion, the convention is to persevere with the most ‘nat­ur­al-look­ing’ ver­sion.

cNon-overloaded subscripting as syntactic sugar
   int A[3] = { 11, 22, 33 };
   int* P = A;                             //← `&A[0]` stored in `P`.
   printf("A[2]     = %d\n", A[2]     );   //← recommended pattern.
   printf("2[A]     = %d\n", 2[A]     );
   printf("*(A + 2) = %d\n", *(A + 2) );
   printf("*(2 + A) = %d\n", *(2 + A) );
   printf("P[2]     = %d\n", P[2]     );   //← recommended pattern.
   printf("2[P]     = %d\n", 2[P]     );
   printf("*(P + 2) = %d\n", *(P + 2) );
   printf("*(2 + P) = %d\n", *(2 + P) );

Now, when most junior C programmers are asked to find the address of the third element, they most likely will write: &A[2] (hopefully, they are not so junior, that they will write &A[3]). But it should be clear that it will give the same result as: A + 2.

The only reason you may want to use &A[2], as opposed to A + 2, is your conviction that it pro­vides more information to a reader or maintainer of your code. Also, prefer A[2] over *(A + 2), even if you know that is what C compiles, regardless of your abstraction.

Subscript Operator and Arrays of Arrays

Since the subscript operator is shorthand for pointer arithmetic, we can avoid man­u­al­ly ap­ply­ing pointer arithmetic. Consider a previous arrays of arrays example, rewritten to use the sub­script op­e­ra­tor:

cSubscripting arrays-of-arrays
int M[4][3] = {
   { 11, 12, 13 },
   { 21, 22, 23 },
   { 31, 32, 33 },
   { 41, 42, 43 }};
// now we can use `M` algorithmically like a 2D array. the `1`
// is the ‘row’ offset, and the `2` is the ‘column’ offset:
printf("M{Row2,Col3} = %d\n", M[1][2]);    //⇒ `23`

Since M[1][2] is firstly translated to: *(M[1] + 2), and M[1] subsequently rewritten as well, it leaves us with: *(*(M + 1) + 2), which is the expression that the previous example used to se­lect the 3rd element from the 2nd ‘row’. This is clearly the preferred syntax to use, as long as there is no doubt, that this is not abstract, but simply disguised pointer arithmetic.

Arrays of Arrays Alternative

Arrays of arrays are not convenient, mostly because the number of ‘columns’ must be con­stant, and part of the type. This makes it difficult to write generic functions with such types.

cArray of pointers
int _mem[4][3] = {
   { 11, 12, 13 },
   { 21, 22, 23 },
   { 31, 32, 33 },
   { 41, 42, 43 }};

int* M[4] = { _mem[0], _mem[1], _mem[2], _mem[3] };  //← array of `int*`.
int** P = M;

printf("M{Row2,Col3} = %d\n", M[1][2]);      //⇒ `23`
printf("P{Row2,Col3} = %d\n", P[1][2]);      //⇒ `23`

The addresses of the ‘rows’ stored in M, could have been dynamically allocated. To keep the code small, the example uses _mem to ‘allocate’ and initialise the memory. We could have used C99's des­ig­nat­ed initialisers instead, but the ‘rows’ would not be guaranteed to be con­tigu­ous, which could be prob­lem­at­ic for certain algorithms:

cArray of pointers using compound literals
int* M[5] = {
   (int[3]){ 11, 12, 13 },
   (int[3]){ 21, 22, 23 },
   (int[3]){ 31, 32, 33 },
   (int[3]){ 41, 42, 43 }};

The point is that M is not an array of arrays — it is simply an array which happens to contain a list of int pointers. Consequently, selecting an element yields an int*, which we arranged to be the ad­dress of a sequence of 3 int values. Now we can use an additional subscript op­e­ra­tor to re­pre­sent an element in the ‘row’: M[row][col].

Memory for the array of pointers, and the elements can be allocated as one block. In this case, the num­ber of ‘rows’ and the number of ‘columns’ can be determined dynamically, depending on run­time re­quire­ments. We can arrange memory as follows:

figure: Array of Pointers Simulating 2D Array

There may be a gap between the array of pointers, and the memory for the actual elements, with­out affecting operations. Since both the number of ‘rows’ and the number of ‘columns’ can vary, both values must be transmitted to a function taking such a construct as parameter:

cFunction with parameter as array of pointers
int sum2d (int* arr[], size_t rows, size_t cols) {
   int total = 0;
   for (size_t r = 0; r < rows; ++r)
      for (size_t c = 0; c < cols; ++c)
          total += arr[r][c];
   return total;
   }

   // Dynamically allocate the ‘2D’ array. `R` and `C` can be variables
   // determined at runtime from other sources. It takes a few liberties
   // regarding the alignment of `int*` and `int`, and not check if the
   // memory allocation succeeded. Also sets values for all elements.

   size_t R = 4, C = 3;
   int** M = (int**)malloc(R * sizeof(int*) + R * C * sizeof(int));
   for (size_t r = 0; r < R; ++r) {
      M[r] = (int*)(M + R) + r * C;
      for (size_t c = 0; c < C; ++c)
         M[r][c] = (r + 1) * 10 + (c + 1);
      }
   printf("M[1][2] = %d\n", M[1][2]);
   printf("sum2d(M, R, C) = %d\n", sum2d(M, R, C));

Remember that, in this context, int* arr[] is equivalent to: int** arr, but is more de­scrip­tive for this particular situation. For better portability, the space for the ‘row’ pointer could have been se­pa­rate­ly allocated from the memory for the actual elements. The only danger with that option, is re­mem­ber­ing to free() two blocks of memory. C allows us to decide on trade­offs.

If we wanted to protect the array elements from accidental modification in a function like sum2d(), we could have defined it as:

cConst pointer to const in array of pointers parameter
int sum2d (const int * const arr[], size_t rows, size_t cols) {
   ···
   arr[1][2] = 99; //← will fail to compile
   ···
   }
   … sum2d((const int * const *)M, R, C)

Unfortunately, because of some C inadequacies, we have to cast the passed parameter to the cor­rect type. When safety is paramount, however, it is a small price to pay. We could also have moved the first const, without affecting semantics:

int sum2d (int const * const arr[], size_t rows, size_t cols)

Because of the memory layout achieved, we could also treat the actual elements as a con­tigu­ous ar­ray of int values, and pass it to functions that can work with a ‘normal’ array:

cPassing array of pointers as begin/end pointers
int sum (const int* beg, const int* end) {
   int total = *beg++;
   while (beg != end)
      total += *beg++;
   return total;
   }
···
   int t = sum(M[0], M[0] + R * C);

This version also does not suffer from having to cast the argument passed to the sum() function.

Pointer Type Conversions

Simplistically: any pointer type can be converted to any other pointer type with an explicit cast. The only implicit pointer type cast, is converting from any pointer type to a void*. In C, the re­verse is also au­to­ma­tic, but should not be depended on, since it is not true in C++.

The practical result of casting a const T* to T* (a const pointer, to a non-const pointer), is im­ple­men­ta­tion defined, and generally just bad practice.

Casting function pointers is possible, but very dubious, and the result is implementation de­fin­ed.

Converting an integer type to a pointer type, and vice versa, is allowed. But the result is im­ple­men­ta­tion defined, and thus not necessarily very portable.

Assuming data is properly aligned in memory, a pointer to the first byte, e.g. char*, can be cast to a pointer of any type, e.g. long*. Applying the indirection operation to the result means that, ef­fec­tive­ly, we can treat any piece of memory as any type of value. Again, we try to avoid this as much as possible.

cTreat memory as different types via casting
unsigned char _mem[] = { 0x41, 0x42, 0x43, 0x44, 0x00, 0x11, 0x22, 0x00 };
printf("%c\n",  *(char*)_mem);           //⇒ A
printf("%s\n",   (char*)_mem);           //⇒ ABCD
printf("%d\n", *(short*)_mem);           //⇒ 16961
printf("%d\n",   *(int*)_mem);           //⇒ 1145258561
printf("%ld\n", *(long*)_mem);           //⇒ 9588842051093057
printf("%c\n", *(char*)(_mem+2));        //⇒ C

The values used are not significant. We only wanted to show that the same se­quen­ce of mem­o­ry can be treated as different types, and hence can produce different values, since more bytes are in­volved in the value. Also note that, if sizeof(long) is not 8 bytes, it will not dis­play the value as in­di­cat­ed in the rel­e­vant comment.

Any pointer type value can be cast implicitly to void*, as mentioned before. The reverse is true in C, but not C++, so it should be rather cast explicitly. The return of malloc(), for ex­amp­le, is a void*, and this extract follows the suggestion:

   int* P = (int*)malloc(N * sizeof(int));

This would be considered good programming practice.

Supplementary Topics

A few topics, although not directly related to pointers, are lightly covered below, because they are often not well understood, and this may aid comprehension.

Implicit Data Movement

Some operations are implied, in other words, the operation takes place because of the con­text in which an expression is used, not because of a physical operator. In particular, when passing ar­gu­ments to functions, no operator is required to facilitate the movement of the expression's val­ue to the special local variable of the function, which we call a parameter.

Here is a summary of all occasions where data movement takes place, in other words, a source and a destination is involved. The source and destination types should either be the same, or the source type must be implicitly convertible to the destination type.

When we say a language has ‘pass by reference’, it means the language has a syntax where­by a pro­gram­mer can spec­i­fy that an argument to a parameter must be automatically passed as an address (trans­par­ent­ly), and that access to the parameter will automatically and trans­par­ent­ly, indirect through the address.

In C, we must do all that explicitly, by declaring the parameter as a pointer type, by explicitly ob­tain­ing the address of the argument, and by explicitly applying the indirection operators — no syntax, nothing automatic, no transparency. Even in languages that support it, ‘passing a ref­er­ence’ is dif­fer­ent from ‘pass by reference’.

Implicit data movement takes place when a value is returned from a function. It is returned as a tem­po­rary, anonymous, variable. Practically, for efficiency, smaller values may be re­turned in a register, but this does not affect the principle. The same rules that apply to as­sign­ment, not only apply to ar­gu­ment pass­ing, but also to function returns.

Concept of a Singular Type / Value

A value in C does not have to map cleanly to assembler types (integers of varying sizes and float­ing point values). A single value in C can be compound; in other words, not a scalar. There is only one way, how­ever, to get a compound, arbitrarily sized value, and that is with a struc­tured type value.

Regardless of size, a structured type value can be moved around (assigned, passed as ar­gu­ment, or returned from a function), as ef­fort­less­ly as an int. We can therefore treat a struc­tured type value as sin­gu­lar, when we require — it is just another T. Technically, however, we cannot call it scalar.

For types that depict singular types, the syntax for definition is simple: T V;. The variable V fol­lows the type T. This is also true for pointer types (scalar values): T* P;, but this is not sur­pris­ing, since we have al­ready ascertained that T* is ‘just another’ Type.

For contrast: With arrays, the syntax requires that the type enfolds the object of the type: T A[N];. The type of A is T[N], but we must write the type around the A. If, however, we create a synonym for this type, we can use it as a singular type:

cType alias for array
typedef int T[3]; // `T` is synonym for `int[5]`.
···
   T A       = { 11, 22, 33 };   //← `A` has type `T`, i.e. `int[3]`.
   int B[3]  = { 11, 22, 33 };   //← `B` has type `int[3]`
   T* P = &A;                    //← `T*` is synonym for `int(*)[3]`.
   int(*Q)[3] = &A;              //← `P` and `Q` have the same type.

   printf("%d %d %d %d\n", A[1], B[1], P[0][1], Q[0][1]);

The output will be 22 for all expressions passed to printf(). This is not useful for function types, which also enfold the subject of the type, but conversely, it is very useful for function pointer types.

Type Syntax Variations

Here is a complete summary of the shapes (syntax) for various categories of types in C. The symbol indicates the subject of the type, i.e., the language element to which the type in the declaration or definition, is ap­plied:

Optional keywords used together with types, as type modifiers, or type specifiers, that affect the storage class, linkage, and volatility:

extern static volatile register const

These have no effect on the operators discussed, except for variables with register qual­i­fied types, whose addresses cannot be taken.

Performing indirection on a const* pointer yields a const, which is not an lvalue. Ordinarily, in­di­rec­tion yields an lvalue.

Remember that subscript and indirect member selection, are also indirection expressions (or in­volve in­di­rec­tion, in the case of the indirect member selection operator).

Summary

All aspects of pointers, and indirection, are supported by a handful of rules. An (ad­mit­ted­ly in­ti­mate) understanding a handful of core rules, is all that is required. This means mastery is em­i­nent­ly ach­iev­able. It does not preclude the requirement of understanding the other rules of C, but con­sid­er­ing these are arguably the most complex, the premise should hold.

It should be apparent that arrays are only superficially supported in C, and that pointers, and the op­e­ra­tors that employ them, play an indispensable role.

Pointers and Operators Summary

Summarized in the points below, T is any type, P has type T*, and A is an array of N elements of type T. V is a variable of type T, and F is a function taking parm parameters (any), returning a T value. X can be any integer type (int used here). In short, assume these definitions and declarations as ‘given’, and properly initialised:

T V; T* P; T A[N]; T F(parm); int X;

  1. An address is a number depicting the address of a byte in memory.
  2. A pointer type, e.g., T*, depicts the address of a T value.
  3. &V (address of V) results in a value of type T*.
  4. *P (indirect P) means ‘represent a T at address P’ and is an lvalue by default.
  5. P +/- I is commutative, and equals P +/- I * sizeof(T).
  6. A[X] or X[A] is rewritten as *(A + X) or *(X + A) respectively, before compilation. This is legal, but given A is the pointer operand, using X[A] is discouraged.
  7. A results in a pointer to the first element, and thus has type T*. This applies to any expression representing an array, not just array names.
  8. &A has type: T(*)[N] (pointer-to-array).
  9. F has type: T(*)(P) (pointer-to-function).
  10. F(arg) has type T (result of function call operator).
  11. P->M is syntactic sugar (shorthand) for (*P).M.

The last rule requires that P is a pointer to a structured type, and that M is a member of that struc­tured type.

Obtaining Pointers Summary

Although some of the following points are implied by the previous summary, the focus here is on ob­tain­ing pointer type values only. Programmers can obtain a pointer type value by:

  1. Using the address-of operator on lvalues.
  2. Representing an array in an expression.
  3. Using the name of a function.
  4. Using a string literal.

Last Words

There are not many rules. Nor complicated ones. The main problem is the syntax chosen for de­c­la­ra­tions, especially since the types can be combined in endless combinations. Gen­er­al­ly, many ex­am­ples and hours of practice are required before most programmers feel com­plete­ly com­fort­able with all these rules. But it is entirely possible. A suggested course of action is to be­come familiar with these rules, before trying to combine several types, since that tends to ob­scure the patterns. The liberal use of typedef is another feasible technique.


  1. When overloading + on pointers in C++, commutativity is not preserved.↩︎

  2. If the operator is overloaded in C++, this rule does not apply.↩︎

  3. In C++, we also have T::*, or ‘pointer to member’, but it acts more like an offset to a member in a struct or class, and thus is not technically an address.↩︎


2023-06-03: Rephrasing, edits and several new links to resources. [brx]
2021-11-26: Fixed int* where char* should have been. [brx]
2020-05-29: Formatting (due to Ockert van Schalkwyk's advice). [brx]
2019-02-18: Changed type alias FP to FT. [brx]
2018-10-17: Additional typedef for function pointer types. [brx]
2018-08-10: Code corrections, typography changes, small additions & editing. [brx]
2018-05-24: Modified some output examples for arrays-of-arrays. [brx]
2018-04-12: Fixed reported typos. [brx]
2017-11-16: Update to new admonitions. [brx]
2017-09-22: Editing and formatting. [jjc]
2017-03-11: Created. [brx]