Thursday, November 27, 2014

Functional Programming in Mainstream Languages, Part 2: Introduction to Higher-Order Functions in Java 8

(Jump to Table of Contents)
If immutability is the primary principle of functional programming, higher-order functions are its most important technique.  Higher-order functions, also called lambas, are now part of many mainstream languages.  In particular, they have been added to Java 8.  In this post I will show how higher-order functions can be expressed in Java 8.

I can't explain the details of the theory and techniques here; I highly recommend the wonderful book Structure and Interpretation of Computer Programs (2nd Edition) by Abelson and Sussman.  Most of the examples in this series of posts are taken from that book.

In functional programming, functions are first-class objects.  This means that functions can be passed as parameters to other functions, they can be returned from functions, they can be stored in data structures; in other words, anything you can do with any other object in the language, such as numbers or strings, you can do with functions.

Some languages fake it: for example, in C you can pass around function pointers, but not function objects.  The major difference is that function objects contain not only the code the function executes, but also its environment, meaning all bound variables known to the function.  This is particularly important (and powerful) when functions are nested inside other functions (or methods, or any other binding scope).

The inventors of LISP struggled with the correct semantics of first-class functions; they called this the "funarg problem."  It turns out that the power of functional programming requires lexical scope, which means that variable names must be resolved with respect to the nesting of functions in the code, rather than to the structure of the calling stack at runtime.  And in order for that to work correctly, it is impossible to allocate function parameters and other local variables on the program stack if those may be referred to in an internal function that is returned from the call.  This complicates the compiler, but is invisible to programmers using the language.

As a simple example, consider a function make-adder that accepts a single number, and returns a function that adds that number to its own argument.  For example, if make-adder is called with the value 1, it will return a function that can be called increment, since it adds one to whatever argument it is given.  Here is how to write this function in Scheme and in Java 8.

Scheme
(define make-adder (lambda (x) (lambda (y) (+ x y))) (define increment (make-adder 1)) (increment 4)
Java 8
IntFunction<IntUnaryOperator> makeAdder = x -> y -> x + y; IntUnaryOperator increment = makeAdder.apply(1); int res = increment.applyAsInt(4);

The result of the last expression (in either language) is 5.

The syntax is quite different, but the effect is the same.  Let's look at the functions first.  The Scheme syntax is a direct translation of the syntax used in the Lambda Calculus, which was invented by the great logician Alonzo Church, and which forms the theoretical basis for functional programming (in addition to its major importance in the theory of computer science).  In the lambda calculus, the make-adder function would be written like this: λxy.x+y.

This syntax describes a function of one argument whose name is x; this is indicated by the notation "λx.".  What follows the dot is the value of the function for a given value of x; in this case, this is another function, whose argument is called y.  The value returned by this function is the sum of x and y.

In Scheme, a function is denoted by the syntax (lambda (v1 v2 ...) body), where v1 v2 ... are the parameter names, and body is an expression representing the value that the function returns.

In Java 8, function syntax has the names of the parameters before the arrow (->), and the body follows the arrow.  It is possible to specify more than one argument, in which case the arguments need to be put in a parenthesized list.  It is also possible to specify the types of the arguments, but fortunately the designers of Java have finally realized what a burden repetitive types are on programs, and Java 7 introduced type inference into the compiler in order to eliminated some (but not enough) of this repetition.  Java 8 extends this mechanism also to lambdas.  In this case, the compiler can infer the types of the arguments from the type declared for the function makeAdder.  This type needs some explanation.

Java 8 has a new set of interfaces to describe various types of functions.  For example, Function<T,R> is the type of functions that take an argument of type T and return a result of type R.  However, there is a whole set of interfaces that cater for primitive types, in order to avoid boxing and unboxing.  (This primary mistake in the design of Java has been giving programmers grief since Java 1, and is unlikely to go away.)  So IntFunction<R> denotes functions that take an int and return a value of type R.

What is the return type of makeAdder?  It is a function that takes an int and returns an int; such functions are described by the interface IntUnaryOperator.  The corresponding interface for functions that take an object of type T and return an object of the same type is UnaryOperator<T>.

As I will discuss in a later post, typed functional languages such as ML and Haskell have a rich set of functional types and a concise yet very expressive type notation.  The type of makeAdder would be specified in such languages simply as int -> int -> int, a notation that is similar to the way functions are defined.  This one indicates a function that receives an int, and returns a function of type int -> int, that is, a function that takes an int and returns an int.  Such notations would make the burden of type specifications in Java much easier.

#functionalprogramming #java8

No comments:

Post a Comment