Java Lambdas and Streams

Review of Generics

Generics were introduced in Java 5 and are used extensively for type safe collections. With the inclusion of functional programming in Java 8, generic usage has expanded into lambdas and higher-order functions, which are almost impossible to use without a good knowledge of generics.

Table of Contents

Why Use Generics?

Generics allow us to use types (interfaces and classes) as parameters when defining classes, interfaces and methods. Much like formal parameters in method declarations, type parameters allow us to reuse the same code with different inputs. The difference is that the inputs (arguments) to formal parameters are values, while the inputs to type parameters are types.

Code that uses generics has many benefits over non-generic code:

Gives stronger type checking at compile time. A Java compiler applies strong type checking to generic code and issues errors if the code violates type safety. Fixing compile-time errors is much easier than fixing runtime errors.
Enables programmers to implement generic algorithms. By using generics, programmers can implement reusable, customisable, type-safe algorithms that can work with collections of different types in a pluggable manner.
Eliminates typecasts. The following code snippet without generics requires casting:

    List list = new ArrayList();
    list.add("Hello");
    String s = (String) list.get(0); // casting required

When rewritten using generics, no casting is required:

    List<String> list = new ArrayList<String>();
    list.add("Hello");
    String s = list.get(0);   // no casting

Knowing how to use and apply generics is extremely important, especially when using Java 8 and above. Even novice Java programmers need to know how to use classes that support generics. For example, we can’t get the benefit of type checking and type safety when using collections such as List, Map, and Set without knowing how to use generics.

    List<Employee> emps = new ArrayList<Employee>();
    
    Map<String, Employee> empTable = new HashMap<String, Employee>();

Intermediate Java developers should be able to define classes or methods that support generics. In Java 7 and earlier, being able to do this was mostly reserved for advanced developers. But this is done much more commonly in Java 8, because of the need to use generics for lambda expressions and stream processing. The goal of both generics and lambda functions is to make code safer and more reusable, which is a goal that all programmers share.

Syntax for Generic Classes and Methods

When declaring a class or method that supports generics, we need to define the type parameter section, which is delimited by angle brackets (<>). It either follows the class name, or precedes a method return value if the class itself is not parameterized. It specifies the type parameters (also called type variables) T1, T2, …, to Tn. The use of those identifiers refers to types, not to variables.

The following are some example code snippets that show type parameter usage:

    // generic interfaces and classes
    public interface Iterator<E> { ... }
    public interface Map<K,V>    { ... }
    
    // generic (parameterized) classes and their methods
    public class Stack<E> 
        {
        ...
        public synchronized void push(E item) { ... }
        public synchronized E pop() throws StackEmptyException { ... }
        ...
        }

    // generic methods in non-generic classes
    public static <T> T randomElement(T[] array)      { ... }
    public static <T> T lastElement(List<T> elements) { ... }

As can be seen above, the widely-used type naming convention is to use single uppercase letters, usually:

E for element
K for key
V for value
N for number
R for result
T for type
S, U, V for second, third, fourth types, etc.

This is just a convention, so any valid and relevant identifiers can be used.

Generic Method Examples

The following class is not a generic class, but its methods are generic, which means that the type parameter <T> is not at the class declaration level, but only on the method declarations. The firstMatch() method takes a List of T objects and returns a T object. The <T> at the beginning of the method declaration means T is not a real type, but a type parameter that the Java compiler will determine from the context in which it is used, either as the types of the parameters of a method call, or from the instantiation of an object (if it is a generic class).

    public class MatchingUtils
        {
        public static <T> T firstMatch(List<T> entries, ...) { ... } 
        ...
        }

We could use the generic firstMatch() method as follows:

    List<Person> people = ...; 
    Person matchedPerson = MatchingUtils.firstMatch(people, ...);
    
    List<Book> books = ...; 
    Book matchedBook = MatchingUtils.firstMatch(books, ...);

The following additional example is again for a non-parameterized class, but with a generic method. The method returns a random element from a generic array that is passed to it.

    public class RandomUtils 
        { 
        private static Random r = new Random(); 
        
        public static <T> T randomElement(T[] array)
            {
            return array[r.nextInt(array.length)];
            }
        }

The T in the randomElement() method declaration refers to the type which Java will infer from examining the parameters of the method call. Even if there was an existing class called T, it is irrelevant here, because T is a placeholder for a type to be passed in as a parameter later. The method takes in an array of T objects and returns a T object. For example, if we pass in an Integer array, an Integer object will be returned; if we pass in a Person array, a Person object will be returned. No typecasts are necessary.

We could use the RandomUtils class as follows:

    String names[] = { "Tom", "Dick", "Harry" }; 
    String name    = RandomUtils.randomElement(names); 
    ...
    
    Integer nums[] = { 2, 4, 6, 8, 10 };    // must be an Integer[], not int[]
    int num = RandomUtils.randomElement(nums);
    ...
    
    Color colors[] = { Color.RED, Color.GREEN, Color.BLUE }; 
    Color color    = RandomUtils.randomElement(colors); 
    ...
    
    Person people[] = { 
            new Person("Fred Bloggs", 35), 
            new Person("John Smith",  42), 
            new Person("John Doe",    27), 
            new Person("Jane Doe",    56), 
            }; 
    Person person = RandomUtils.randomElement(people);
    ...

Note again that typecasting is not required to convert to String, Color, Person, or Integer. Autoboxing allows us to assign an element from the Integer[] array to an int, but the array passed to randomElement() must be Integer[], not int[], since generics work only with Object types, not primitive data types.

Generic Class Example

The following example is for a very simple generic (parameterized) stack class with push() and pop() methods. Both the class and the methods are generic. For comparison, there is a full Stack class in the java.util package.

    public class Stack<E> 
        {
        private E stack[];
        private int sp;      // stack pointer to next empty position
        private boolean isEmpty;
        private boolean isFull;
        
        // constructor
        public Stack(int size)
            {
            stack = new E[size];
            sp = 0;
            isEmpty = true;
            isFull = false;
            }

        public synchronized void push(E item) 
            {
            while (isFull)
                {
                try
                    {
                    wait();
                    }
                catch (InterruptedException e)
                    {
                    }
                }
            if (sp < stack.length)
                {
                stack[sp++] = item;
                isEmpty = false;
                }
            if (sp == stack.length)
                isFull = true;
            notifyAll();
            }
        
        public synchronized E pop () throws StackEmptyException 
            { 
            // similar code...
            ... 
            }
        ...
        }

Methods in the class can now refer to E both for arguments and for return values, where E doesn’t refer to an existing type. Instead, it refers to whatever type was defined when a stack was created. In the following code, E would refer to a String, the push() method would accept a String parameter, and the pop() method would return a String object:

    Stack<String> words = new Stack<String>(10);
    words.push("Hello");
    ...
    String s = words.pop();
    System.out.println(s);
    ...

In the same way, if we created Stack<Person>, the push() method would accept a Person and the pop() method would return a Person object. No typecasts would be required when using push() and pop().

Type Inference (Diamond) Operator

From Java 7, we can replace the type arguments when invoking a constructor of a generic class with an empty set of type arguments (<>) as long as the compiler can determine, or infer, the type arguments from the context. This pair of angle brackets is informally called the diamond operator.

Using the previous code:

    Stack<String> words = new Stack<String>(10);
    ...

We can use the <> operator with the constructor, because the compiler will be able to infer the type from the usage context:

    Stack<String> words = new Stack<>(10);
    ...

The Java compiler uses a type inference algorithm to look at each method invocation and the corresponding declaration, to determine the type argument(s) that can apply to the invocation and, if available, the type of the returned result. The compiler takes advantage of target typing to infer the type parameters of a generic method invocation. The target type of an expression is the data type that the compiler expects, based on the context. Finally the inference algorithm tries to find the most specific type that works with all of the arguments.

Multiple Type Parameters

A generic class can have multiple type parameters. For example, if we wanted to model a key:value pair, where both the key and the value could be of any type, we might create a generic Pair class:

    public class Pair<K, V> 
        {
        private K key;
        private V value;

        public Pair(K key, V value) 
            {
            this.key   = key;
            this.value = value;
            }

        public K getKey()   { return key;   }
        public V getValue() { return value; }
        }

The following statements instantiate a few objects of the Pair class:

    Pair<String, String>  pr1 = new Pair<String, String>  ("Hello", "World");
    Pair<String, Integer> pr2 = new Pair<String, Integer> ("Two", 2);
    
    Person p = new Person("Joe Bloggs", 37, Gender.MALE);
    Pair<String, Person>  pr3 = new Pair<String, Person>  ("Joe", p);

From Java 7 we can use the diamond operator to reduce a certain amount of typing:

    Pair<String, String>  pr1 = new Pair<> ("Hello", "World");
    Pair<String, Integer> pr2 = new Pair<> ("Two", 2);
    
    Person p = new Person("Joe Bloggs", 37, Gender.MALE);
    Pair<String, Person>  pr3 = new Pair<>  ("Joe", p);

Type Erasure

Adding generics to Java created a problem for backwards compatibility, which has always been an important issue when adding new features to the language. The problem was how to allow older, non-generic collection classes to be used alongside newer generic collections.

The designers decided to do this with typecasts:

    // Given a non-generic list...
    List myList = getMyList();
    
    // This is an unsafe cast, but we can do it if
    // we know that myList contains String objects.
    List<String> myStringList = (List<String>) myList;

This means that, on some level, List and List<String> are compatible as types. Java achieves this compatibility by type erasure, which means that generic types are only visible at compile time and are stripped out by the compiler. All that is left after type erasure is the raw type of the container — in this case myStringList has the type of List.

Non-generic types such as List are referred to as raw types. It is still perfectly legal to work with raw types, however we lose the strict type checking that the compiler gives us, and it’s generally a sign of poor quality code.

Compile and Runtime Typing

Consider the following statement:

    List<Integer> list = new ArrayList<>();

We might be surprised to learn that the type of list is different at compile time to runtime.

At compile time, the javac compiler sees list as a List-of-String, and uses that information for strict type checking, but the compiler doesn’t know the concrete type of list — it just knows that list is compatible with the List interface.
At runtime, the JVM sees list as a raw ArrayList because of type erasure. The type information about the actual contents has been erased and the resulting runtime type is just a raw type.

Wildcards and Bounds

In generic code, the question mark symbol ? is called a wildcard, and represents an unknown type. The wildcard can be used in a variety of situations: as the type of a parameter, field, or local variable; and occasionally as a return type. The wildcard is never used as a type argument for a generic class instance creation, generic method invocation, or a supertype.

We have three major ways we can use wildcards — unbounded, upper bounded and lower bounded:

An unbounded wildcard uses the syntax <?> and represents all types. It is used as an argument for instantiations of generic types, and is useful in situations where no knowledge about the type argument of a parameterized type is needed. Unbounded wildcards allow the broadest conceivable argument set, because the unbounded wildcard <?> stands for any type without any restrictions.
An upper bounded wildcard uses the syntax <? extends T> and represents all types that are subtypes of T, including type T. T is called the upper bound.
A lower bounded wildcard uses the syntax <? super T> and represents all types that are supertypes of T, including type T. T is called the lower bound .

Bounded wildcards are used as arguments for instantiation of generic types. Bounded wildcards are useful where only partial knowledge about the type argument of a parameterized type is needed, but where unbounded wildcards carry too little type information. A bounded wildcard carries more information than an unbounded wildcard. The supertype of such a family is called the upper bound; the subtype of such a family is called the lower bound.

We can specify an upper bound for a wildcard, or we can specify a lower bound, but we cannot specify both at the same time.

Unbounded Wildcards

The unbounded wildcard type is specified using the wildcard character ?, for example, List<?>. This is called a list of unknown type. There are two scenarios where an unbounded wildcard is useful:

When we are writing a method that only uses functionality from the Object class.
When we are using methods in the generic class that don’t depend on the type parameter. For example, List.size() or List.clear(). In fact, Class<?> is so often used because most of the methods in Class<T> do not depend on T.

Consider the following printList() method:

    public static void printList(List<Object> list)
        {
        for (Object element : list)
            System.out.println(element + " ");
        System.out.println();
        }

The obvious goal of printList() is to print a list of any type, but unfortunately it can only print a list of Object instances; it can’t print List<Integer>, List<String>, List<Person>, etc., because they are not subtypes of List<Object>.

To write a generic printList() method, we must use the wildcard syntax List<?> as follows:

    public static void printList(List<?> list)
        {
        for (Object element : list)
            System.out.println(element + " ");
        System.out.println();
        }

This works because List<T> is a subtype of List<?> for any concrete type T. That means we can use printList() to print a list of any type:

    List<Integer> iList = Arrays.asList(1, 2, 3);
    List<String>  sList = Arrays.asList("one", "two", "three");
    printList(iList);
    printList(sList);

It’s important to remember that List<Object> and List<?> are not the same. We can add an Object, or any subtype of Object, into a List<Object>. But we can only add null into a List<?>.

Upper Bounded Wildcards

We can use an upper bounded wildcard to relax the restrictions on a variable. For example, let’s suppose we want to write a method that works on List<Integer>, List<Double>, and List<Number>. We can do this by using an upper bounded wildcard.

An upper bounded wildcard restricts the unknown type to be a specific type or a subtype of that type and is written as: <? extends T> where T is the upper bound. In this context, extends is used in a general sense to mean either implements (as in interfaces) or extends (as in classes).

To write a method that works on lists of Number and its subtypes, such as Integer, Double, etc., we would specify List<? extends Number>. The term List<Number> is more restrictive than List<? extends Number> because List<Number> matches a list of type Number only, whereas List<? extends Number> matches a list of type Number or any of its subclasses.

Lower Bounded Wildcards

A lower bounded wildcard restricts the unknown type to be a specific type or a super type of that type and is written as: <? super T> where T is the lower bound.

Suppose we would like to write a method that puts Integer objects into a list. For flexibility, we’d like the method to work with List<Integer>, List<Number>, and List<Object>, i.e. anything that can hold Integer objects.

To write the method that works on lists of Integer and its supertypes, we specify List<? super Integer>. The term List<Integer> is more restrictive than List<? super Integer> because the List<Integer> matches a list of type Integer only, whereas List<? super Integer> matches a list of any type that is a supertype of Integer.

The following code adds the numbers 1 through 16 to the end of a list:

    public static void addNumbers(List<? super Integer> list)
        {
        for (int i = 1; i <= 16; ++i)
            list.add(i);
        }

Wildcard Guidelines and PECS

One of the more confusing aspects when learning to program with wildcards is determining when to use an unbounded wildcard, an upper bounded wildcard or a lower bounded wildcard.

There is an acronym coined by Joshua Block in his Effective Java book called PECS: Producer Extends, Consumer Super.

Producer Extends — If we need a List to produce T values (we want to read Ts from the list), we need to declare it with <? extends T>, e.g. List<? extends Integer>. But we cannot add to this list! It only produces objects of that type or that can be upcast into that type.
Consumer Super — If we need a List to consume T values (we want to write Ts into the list), we need to declare it with <? super T>, e.g. List<? super Integer>. We can write into this list, but there are no guarantees what type of object we may read from the list.
If we need to both read from and write to a list, we need to declare it exactly without wildcards, e.g. List<Integer>.

Here is a simple example of copying a source list to a destination list. Note how the source list src (the producing list) uses extends, and the destination list dest (the consuming list) uses super:

    public class Collections 
        { 
        public static <T> void copy(List<? extends T> src, 
                                    List<? super   T> dest)
            {
            for (int i = 0; i < src.size(); ++i) 
                dest.set(i, src.get(i)); 
            } 
        }

Here is another way to remember when to use super or extends, if we think in terms of an object X:

If we have a list and want to read an X from that list, it has to be a list of X or a list of things that can be upcast to X as they get read out, i.e. anything that extends X:

    List<? extends X> list;

If we want to write X into a list, that list needs to be either a list of X or a list of things that X can be upcast to, i.e. any superclass of X:

    List<? super X> list;

This all boils down to:

Use extends when we only want to get values from a data structure.
Use super when we only want to put values into a data structure.
Use an explicit type when we have to do both.

Summary

Generics (parameterized types) enforce type safety and reduce coding effort.
Classes, interfaces and methods can be created that accept parameterized types.
When using wildcards, remember PECS — Producer Extends, Consumer Super.

2018-05-19: Edited [jjc]
2018-03-28: Revised [lsc]
2018-03-24: Edited. [jjc]
2018-03-24: Created. [lsc]