Java Generics in Depth - Stan’s Signed Snippets

Why do we need Java generics?

Generics introduce a new level of compile-time type safety. This in itself makes it one of the most impactful updates to Java’s core, and arguable, brings a sane amount of type safety to the Collections framework. The most common bug this is meant to address is a nasty ClassCastException when we believe an object is of one type when really it’s not. With Generics, this is caught at compile-time. Additionally, it improves code readability by absolving the programmer from using explicit casts.

Quick overview of terminology

simple generic class that extends another generic class

public class MimicList<T> implements List<T> {
  public void add(T item) { ... }
  
  public String findAlphaMimic() { ... }
...
}

Here, MimicList is a generic type. T is the type placeholder. The placeholder isn’t limited to a single letter - any valid Java identifier will do. A generic type can have one or more type placeholders. By adding type holder to a class, we are turning it into a generic type. It is represented by a concrete type when we declare/initialize MimicList as a concrete parameterized type. If your generic class is extending or implementing another generic class, you can use the same type placeholder(T) in the class we are extending to represent the same type. You cannot initialize a generic type directly(new MimicList() ) because it is an abstract type. You also cannot use a primitive type to as a type parameter. That’s because it must be convertible to java.lang.Object when the compiler performs type erasure.

a parameterized concrete type

MimicList<Earth> omegas = new MimicList<Earth>();

A parameterized type is dependant on an existing generic type definition to exist. If I tried to initialize MimicList without making MimicList a generic type(above), it would fail at compile-time.

The code below demonstrates why we can’t use primitives as type parameters. After the compiler strips the type information(Type Erasure, discussed later), it has to use casts to ensure that we are working with the parameterized type we initialized the object with, or throw an exception that we can’t, say, put the result of list.get(0) into a Integer.

user generated code

MimicList<String> mimicList = new MimicList<String>();
mimicList.add("MIMIC");
String mimic = mimicList.get(0);

becomes..

compiler code … effects of type erasure

MimicList mimicList = new MimicList();
mimicList.add("MIMIC");
String mimic = (String)mimicList.get(0); // the compiler must add a cast to enforce type safety

The compiler would erase all type information from parameterized types and instead add explicit casts to raw types. At the bytecode level, both a generic class a raw class would look exactly the same. One consequence of this is that objects at runtime do not contain information about their generic arguments (although the information is still present on fields, method, constructors and extended class and interfaces). A benefit to this is since all type information is erased, there’s only need for one version of the generic class to be stored in the bytecode for all variations of possible types(in comparison to C++ templating where each type had it’s own version).

The type information would be completely erased, making generics non-reifiable. Bridge methods were also added on a case by case basis.

Extending generics

I briefly alluded that we can easily extend an existing generic class with a new generic class without much complication. In the example above, type T is an unbounded type parameter that is a type placeholder for MimicList. If we initialize MimicList as a raw type, then the underlying superclass will also be initialized as a raw class.

Raw Classes

A raw class is basically a generic class declared without any type parameters. Any inner classes of a raw class will also be a raw class. The only exception is a static inner class. It would be considered raw because it’s technically not a parameterized type. It’s not even part of that instance, since it’s just a static.

In this example, we have no need for a placeholder because we are not referencing it anywhere. In such cases we can just replace it with ? to mean the same thing. It will have the same effect of creating a generic class definition with an unbounded type - any reference type. We still have the same type safety guarantees as if we used a java identifier instead. A wildcard without bounds is called an unbounded wildcard.

Variance

Covariant, Invariant, and Contravariant. These concepts are the building blocks of subtyping in modern languages. In Java, generics are invariant by default. Just because class Y is a subclass of class X does not mean that SomeGeneric will be a subclass of SomeGeneric. It won’t. The main reason is because generics are non-reified thanks to type erasure. However, there is a syntactic addition to force the type parameter to be covariant or contravariant during initialization.

Bounded Type Parameters

Type bounds can be restricted with the super or extends keyword. If you want to restrict initialization to instances of itself or it’s subtype, you use the extends keyword(covariance). If you need to limit the initiziation to itself or all supertypes, you use the super keyword(contravariance). You are not limited in how many bounds you can specify. You can only have one class bound(since multiple inheritance is not allowed in Java), but you can have an unlimited number of interface bounds! Later we’ll discuss why this makes our code much more flexible without decreasing type-safety.

History of Type erasure

Type Erasure exists because Sun wanted to keep binary compatibility with older versions of Java(versions 4 and below) when Java 5 with Generics was introduced. It’s also basically the reason Raw classes are still allowed. There is no excuse to use a raw class when you have wildcard bounds(The only exception being class literals and the instanceof operator). At worst case, the unbounded wildcard type should fit any scenario. Bridge methods are quite useful since they let us use generic types as raw types, and more importantly, allow us to use parameterized types in function calls after their type parameters are erased by .. type erasure. Is that redundant enough for you?? Unfortunately, because Java generics are non-reified, there are two exceptions where raw types must be used in new code:

Class literals, e.g. List.class, not List.class
instanceof operand, e.g. o instanceof Set, not o instanceof Set

Sometimes you need to use a raw type or use an explicit unchecked typecast. Whether it’s for immovable things like legacy code or practical purposes like unit/mock testing, there are acceptable scenarios where we might want to forego strong compile-time safetey. To do so, we have to annoate the piece of code with @SuppressWarnings(“unchecked”).

surpress the cries of the compiler

public void someETL(Collection legacyCollection) {
@SuppressWarnings("unchecked")
List<String> typedLegacyCollection = (List<String>)legacyCollection;
// process 
}

It’s recommended you place the annotation as close to the offending line as possible. We could have placed it before the function defintion, but then all unchecked warnings in that function would be ignored, not just the initial conversion of List. Likewise, if we placed it before the class declaration, it would mask weak typing warnings througout the entire class!

Wildcards

Wildcard types can be pretty confusing. I’ll just have a simple overview. They are very useful when we want to introduce some type flexibility into our functions and collections, but at the same time be able to keep all the compile-time safety that generics provides us.

simple mimic list

public class SimpleMimicList<?> { //we can’t use wildcards in class declarations
  
  public String findAlphaMimic() { ... }
...
}

One practice that takes advantage of flexible wildcards is the PECS principle.

Producer Extends, Consumer Super principle

The idea behind PECS is super simple, but it’s not intuitive just from reading that title. In fact if you dig deep enough it’s a very complicated topic dealing with variance(covariance and contravariance). But it’s actually very simple if you just think of it in terms of type safety.

Let’s start with an abstract type Soldier and some concrete classes

public class Soldier {
void attack() { //pew pew
}
}

public class Rita extends Soldier {
  void attack() {
      //SPLICE SPLICE BOOM
  }
}

public class Cage extends Soldier {
  void attack() {
      //how do I turn this safetey off??
  }
}

public class Kimmel extends Soldier {
  void attack() {
      //
  }
}

The “Producer” collection

A collection whose type must be a class that extends the specified type(or itself), meaning that when we have a wildcard type of <? extends Soldier>, it means:

You are guaranteed when you read from this collection, the object will be at least a type of Soldier
You can not put anything inside of this collection, except for null. That’s because null can technically be of any type.
This is called a Producer collection because it produces data. This is point of view from the collection

The reason that you can’t put anything inside of a Producer collection is because it will break the type safety guarantees. Let’s give an example:

writing to extends(producer)

List<? extends Soldier> allSoldiers = getAllCages(); // returns ArrayList<Cage> 
allSoldiers.add(new Cage()); //not legal
allSoldiers.add(new Soldier()); // not legal as well..
allSoldiers.add(new Object()); //not legal
allSoldiers.add(null); // <a href=”http://img.444.hu/jackie.gif”>this is legal.</a> 

Why are none of these legal? Because the allSoldiers reference can point to either a collection of Soldier, Rita, Cage, or Kimmel. But at compile time we don’t know which one it’s going to be. For type safety, the compiler cannot allow us to add class which might or might not cause a cast exception. All the compiler knows is that whichever class we initialize allSoldiers with, it must at least be a Soldier.

reading from extends(producer)

List<? extends Soldier> allSoldiers = getAllCages(); // returns ArrayList<Cage> 
Soldier oneSoldier = allSoldiers .get(0); // this is perfectly fine. 
Cage oneCage = allSoldiers .get(0); // this is NOT legal. 

This argument is similar. Can you spot the pattern here? Yes, it’s all about ensuring type safety. We can only be sure that allSoldiers will be at worst case a type of Soldier, at compile time. At run-time, it could point to a reference of Kimmels for all we know.

The “Consumer” collection

A collection whose type is a supertype of the provided class, meaning that when we have a wildcard type of <? super Soldier>. This means:

You are guaranteed when you write to this collection, the object must be a Soldier or it’s supertype
As a consequence, we can only initialize the generic with a reference who’s type must be a supertyper
This is called a Consumer collection because it consumes data. This is point of view from the collection
User defined destructor
The only type you are guaranteed to get back when you read from this collection is Object

initializing super(consumer)

List<? super Rita> maybeRitas = new ArrayList<Soldier>(); //legal, Soldier is supertype of Rita
List<? super Rita> maybeRitas = new ArrayList<Rita>(); // legal, it’s own type guarantees type safety
List<? super Rita> maybeRitas = new ArrayList<Object>(); // legal
List<? super Rita> maybeRitas = new ArrayList<Cage>(); // not legal, Cage is not a supertype of Rita

Why does ? super Rita give us flexible type safety? In the example code above, it’s clear what we can only initialize maybeRitas with a supertype of class Rita. So it can only be of type Rita, Soldier, or Object. Let’s say we then want to add something to this collection. We are confident that if we an instance of Soldier into maybeRitas, it is guaranteed to be a subtype of whatever type we initialized the list with(Rita, Soldier, Object). But you can’t, for example, add a String, Int, or Cage to maybe Ritas no matter which of those three initializations was chosen.

This discussion can delve further into the differences between extends/super between not only initialization, but also it’s implications on the methods of the initialized objects. I might delve further into this but this seems like a nice overview of Java’s attempt to bring more compile-time type safety to the language.