Overview: programming paradigms
If you are not totally inexperienced about computer programming, in which case reading this article would just plainly be a total loss of time, you certainly know that there are a lot of different programming languages that have been developed in different periods to supply different needs; each one of them carries a different view about the semantic world they cover. There is, indeed, an entire branch of computer science that studiesprogramming paradigms, as are these “views” called.
We don’t really need to review the story of computer programming back to hard wiring in this article, but a brief synopsis is needed in order to make things a bit clearer. So, after someone realized that wiring physically the programs wasn’t just going to work, computers started being programmed using binary code that represented control sequences fed to the computer CPU. That probably made programmers sweat a bit less, but still it was as comfortable as being stabbed. Shortly after, in the early 50s, assembly languages were invented in an attempt to make computer programming easier.
At the time symbolic labels and mnemonic keywords probably seemed like heaven, but still writing assembly is not exactly the most comfortable thing you can think of. It’s somehow like working in the sewers: you know that it’s an important work, and that someone’s to do it, but you really don’t want to know anything more, no matter how many documentaries Discovery Channel makes on it. Coping with the dirty secrets of complexness is not for everybody.
Indeed, it didn’t take much longer again before somebody else got tired of assembly’s complexity and tried to abstract it into what is known as imperative programming: a series of statements that express, step by step, exactly the procedure that should be followed to solve a problem. I don’t really want to dig into the issue, but procedural programming (as is it’s also called, with a particular reference to those languages capable of organizing groups of operations into blocks called “functions”) carried an immense number of mathematical and logical issues, some of which are really worth a read (i.e. Dijkstra’s GOTO statement article, compilation-related automaton theory, design patterns, data representing, ecc), and it’s still the mainly used programming technique.
Yes, because the next-step, id est Object-Oriented programming, is still procedural. By OO programming we mean languages where data, and methods that can manipulate the data, are packed together into single units called “objects” that represent a coherent and transparent set of information; in fact, external user can access the data is via the object’s “methods”, which are nothing else than well-fitted procedures. The revolution carried by OO programming doesn’t lie on a brand new programming paradigm itself, but rather on how programs are structured thanks this sort of overlay. In fact object-orientation is implementable, in different degrees, on every typed language. There are indeed masochistic implementations of assembly-OO-languages (HLA) as well as functional OO languages (Needle), pure OO languages (Ruby) ecc.
OO programming is an elegant way of expressing a program as an algebra. We define objects (classes) as a hierarchy of structured containers of data plus the operations available on the structured container. This way of defining programs allows for a much cleaner level of abstraction, since we do not manipulate data with “random” chunks of code (procedures) but the data only allows structured accesses to itself through its operators, which in turn become part of the data themselves. [GM]
After OO-programming, another little step toward another layer of abstraction has been given by code managing, which means that the program is executed under the management of a virtual machine, instead that directly on the computer’s CPU; wrapping the executable into a sandbox enhances its safety and security and allows the creation of monsters like garbage collectors, but usually at a price of a noticeable overhead (JIT compiling, intermediate languages and similia have been invented to increase efficiency of interpreted languages). It’s important to observe, though, that theoretically every language could be interpreted (executed into a sandbox) or compiled (and then executed directly on the CPU).
Managed code allows to define safer constructs, for example forbidding (let the Lord of Heaven be praised) pointer arithmetic (cit. Djikstra). By providing some restrictions on managed code, we can achieve compilers that guarantee very strong statically validated properties (see Singularity compiler) and very high level runtime primitives ranging from lambdas to reflection to quotations. [GM]
On the other side of the barricade (which means universities), another paradigm was being developed in the meantime: declarative programming. For a long time this has been considered just an academic exercise, but later it has gained popularity among specialist. A program written in a declarative language will express the logic of a computation without describing its control flow, focusing on (indeed) declaring what the program should accomplish, rather than describing how to go about accomplishing it.
Different needs
The relationship between languages and expressiveness is hence summarized in the following picture.

Where imperative languages allow the programmer to have a deeper control of what the machine’s actually doing, the declarative approach simplifies a lot the task of programming: we’ll see in the next paragraphs how this abstraction is obtained, what it’s useful for and which needs may supply.
What should be clear at this stage is that there’s no such thing as a “better language” as a general statement. There are more suitable languages for different purposes: nobody would enjoy writing a parser in assembly just as nobody would write a kernel in F# without being a total sandwich short of a picnic, but both things are possible.
Trends
As suggested by Anders Hejlsberg recently, there are three trends that broadly impact on computer programming today. The first is the tendency toward forms of more declarative programming, which is made clear by the wide diffusion of Domain Specific Languages, which are languages targeted to a specific context (XAML, XSLT, HTML, SQL, CSS, Makefiles and so forth). There’s a huge debate around what is and what isn’t a DSL, but here we’ll just observe that they play a relevant role in today’s programming scene.

Second comes the need for dynamic languages. According to Wikipedia, such languages are “high-level programming languages that execute at runtime many common behaviors that other languages might perform during compilation, if at all. These behaviors could include extension of the program, by adding new code, by extending objects and definitions, or by modifying the type system, all during program execution. These behaviors can be emulated in nearly any language of sufficient complexity, but dynamic languages provide direct tools to make use of them.”
Last and definitely not least, is the need for more concurrency. As you may know, nowadays Moore’s law doesn’t apply to CPUs frequency any longer, but it rather does to the number of cores that processors have. This is an actual revolution the market is pushing, which is throwing into crisis the instruments that have been developed after more than half a century of time-sharing mono-processor multitasked programming. Anyone who has ever had the misfortune of coping with Java’s threading model, for instance, knows how complex is to develop multithreaded applications that perform real concurrency. There are countless problems due to shared memory locations and resources, side-effects, (dead)locks, and so on. Thanks to memory sharing issues, it’s impossible to parallelize efficiently different operations if they share variables or resources, because the broad and unavoidable use of mutexes would make them wait for each other exactly as it would happen in a sequential execution. Such solutions don’t really sound like viable possibilities in a market where CPUs with hundreds of cores are foreseen to be developed within the next five years.
Functional Programming and λ-calculus
So, straight to the core of this article, functional programming is possibly an answer to all the needs that computer programming is asking for today. FP is a particular kind of declarative programming where the program consists entirely of functions, in a mathematical sense. Hence, functional programs do not contain any assignment statement, since there aren’t variables and therefore, no side-effects (Although very few modern FP languages don’t allow for side-effects. Haskell is a notable exception, confining side-effects only to the restricted scope of monads. Languages like OCaML or F# allow wide use of mutability while making everything immutable by default. [GM]). So since expressions denote one single value, they can be dynamically evaluated at any time, freeing the programmer from the burden of prescribing one particular control flow; functional programs are hence said to be referentially transparent. We’ll see how all of this is possible further on this series of articles.
The theoretical framework in which FP can be put into is Lambda calculus, a formal system devised in the 30s by Alonzo Church to investigate functions, function application and recursion. It’s basically a very general programming language itself, made of a single transformation rule (variable substitution) and a single function definition schema. In spite of this simplicity, it does formalize every possible computable function.
Objects of study in Lambda calculus are λ-terms, which are anonymous functions that perform an expressed transformation to their arguments. A basic λ-term is this:

which represent an identity function. The following expression is a simple λ-term that expresses a function (λ) that takes x as a argument an returns its double (2x).

Obviously functions can be nested, like this:

That is the function of x that returns the function of y that returns the product of the sum of x and y and 2. Since all λ expressions have only one argument, they can be cascaded like in 3. Such a practice is called currying, not because it’s spicy, but because it’s named after Haskell Curry, whom we’ll meet again soon.
Evaluating λ-expressions is a simple task that reminds formal grammars sentence derivation:




When lambdas come in, the situation gets a little bit more interesting. (The parenthesis is used to “call” a function; when a λ-term is inserted into parenthesis that means that it’s to be evaluated).



The substitution we operated between 8 and 9 on the formal parameter is so called β-reduction, which is a particularly important operation (along with α- and η-conversion), because the meaning of lambda expressions is defined by how expressions can be reduced. It’s possible to apply recursively λ also to other expressions, as follows:



In an expression, each appearance of a variable is either bound or free, depending on whether such variable is an argument of λ or not. β-reduction of

Replaces every x that occurs free in E with y, where E is another well-formed λ-expression. In general, to make λ-terms less confusing, we can use the α-conversion, which is nothing more than a rule that allows us to rename parameters and that we implicitly used also in 3.

It comes quite natural to evaluate λ-terms using the same application order that we are used to normalize polynomials with, but there are actually several issues related to the order that can be chosen to reduce an expression. Basically there are two possibilities: always reduce the leftmost innermost redex (which is a sub-expression that can be reduced), or always reduce first the leftmost outmost redex. These two approaches are respectively called applicative and normative reduction. The former is usually more efficient, the latter accepts a larger class of programs. What matters here is that progressive reduction should lead to what it’s known as a normal form, that is a λ-expression that cannot be reduced any further. Thus,

Is the normal form of

Like human relationships, not every λ-expression has a normal form. Trying to reduce a non-reducible expression to a normal form will only lead to the same form again and again in an infinite loop. There is, anyway, no algorithm which takes as input two λ-expressions and outputs TRUE or FALSE depending on whether or not the two expressions are equivalent. This was historically the first problem for which undecidability could be proven. (There is a strong relationship between Russel’s Paradox and untyped λ-terms where something is applied to itself. [GM])
What is probably the most important result in λ-calculus, is the Church–Rosser theorem, which states that if there are two distinct reductions starting from the same lambda calculus term, then there exists a term that is reachable via a sequence of reductions from both redex. In other words, if there’s a normal form, there’s a strategy to find it and the result is unique; in formal terms, CR theorem is a proof of confluence. At a higher level, it means that evaluation (β-reduction) can be carried out in any order, even in parallel.
It’s important to notice that functional programming is not language-specific, because it’s a general theory (and functional programming is λ-calculus with a more palatable syntax). For instance, C++ or Java can be programmed in a functional way (an article about this will follow), although there are languages specifically designed for the task. What matters is that, on a more formal side, it’s been proven the isomorphism between Turing Machines and λ-calculus.
Λ-calculus give us a solid mathematical foundation, basing on which it’s possible to comprehend that functional programming is no more an academic exercise, but rather it’s the next natural step in computer programming.
—————————————————————————–
[GM] notes by Giuseppe Maggiore