Deep learning is a subset of machine learning methods that are based on artificial neural networks. These are inspired by information processing and distributed communication nodes in biological systems such as the brain. In deep learning, each level learns to transform the input data into a slightly more abstract and composite representation. For instance, in a facial-recognition system, pixels might be one layer of the system, while edges might be another, eyes might be another, and the face might be another. The complexity of deep learning methods makes using existing packages popular in the programming community. TensorFlow and PyTorch in Python are popular, as is the Keras package in R. However, in the production of deep learning systems, performance and safety are two issues that drive companies to choose functional programming languages such as Clojure and Haskell instead.
The difficulties of deep learning implementations
In putting deep learning systems into production, neural networks might contain a million parameters. Data can quickly explode to train these parameters. This explosion of data requires performance that can only be achieved by an efficient programming language with safe concurrency and parallelism capabilities. Due to the complexity of neural networks, with data passed from layer to layer, simplicity and consistency in the way the programming language handles this data is important. Safety, in this case, means the ability to preserve the state of the original data in a consistent manner, while simplicity means being able to read and maintain the code base easily while maximizing performance.
Why functional programming is more suitable for deep learning
In an attempt to resolve some of the difficulties that can occur when implementing deep learning, programmers are finding that functional programming languages can provide solutions.
In computer science, functional programming is a programming paradigm that treats computation as the evaluation of mathematical functions and avoids changing state and mutable data. It is a programming pattern that is closer to mathematical thinking.
Deep learning models are essentially mathematical models. For instance, artificial neural networks comprise connected nodes, each of which performs simple mathematical operations. By using a functional programming language, programmers are able to describe these mathematical operations in a language that’s closer to the operations themselves. The explicit way these programs are written makes reading and maintaining the code base much easier.
At the same time, the compositional nature of deep learning algorithms means that, at each layer of the neural work, the layers or the functions tend to chain together to perform tasks. This can be easily implemented using the functional chaining of a functional programming language.
Furthermore, in deep learning, when functions are applied to the data, the data does not change. New values might be output sequentially down the line, but the data itself stays consistent. The immutability feature of a functional programming language will allow the programmer to create a new dataset each time new values are generated without altering the original immutable dataset. This makes it easier to maintain the consistency of the data throughout the neural network.
Finally, the large number of parameters and training data involved in the implementation of deep learning means that parallelism and concurrency are the keys to creating production-level deep learning systems. Parallelism means running threads on different CPUs to speed up the learning process. Concurrency means the ability to manage threads to avoid conflict. Functional programming allows for concurrency and parallelism at no cost. This means that, by its nature, functional programming, where the pure function is stateless, will always produce the same output for a particular input, lead to the ability to isolate any function, and execute whenever you want it to. This makes concurrency and parallelism much easier to manage. You don’t have to deal with issues such as deadlocks and race conditions. Different threads accessing different CPUs will be able to run independently with no contentions.
With functional programming gaining popularity in deep learning, and with the robust packages available for deep learning, Clojure is now favored by companies such as Walmart and Facebook. It’s a high-level, dynamic functional programming language based on the LISP programming language and it has compilers that make it possible to run on both the Java and the .NET runtime environment.
The power of concurrent programming in Clojure
Clojure doesn’t replace the Java thread system, rather it works with it. Since the core data structures are immutable, they can be shared readily between threads. At the same time, state changes in the program are possible, but Clojure provides mechanisms to ensure that states stay consistent. If conflicts occur between 2 transactions trying to modify the same reference, one of them will retire. There’s no need for explicit locking.
(import ‘(java.util.concurrent Executors)) (defn test-stm [nitems nthreads niters] (let [refs (map ref (repeat nitems 0)) pool (Executors/newFixedThreadPool nthreads) tasks (map (fn [t] (fn  (dotimes [n niters] (dosync (doseq [r refs] (alter r + 1 t)))))) (range nthreads))] (doseq [future (.invokeAll pool tasks)] (.get future)) (.shutdown pool) (map deref refs))) (test-stm 10 10 10000) -> (550000 550000 550000 550000 550000 550000 550000 550000 550000 550000)
Parallelism in Clojure is cheap
In deep learning, models have to train on large amounts of data. Parallelism implies running multiple threads on different CPUs. Parallelism that is cheap will mean significant performance improvements. Using partition in conjunction with map can achieve parallelism that is less costly.
(defn calculate-pixels-2  (let [n (* *width* *height*) work (partition (/ n 16) (range 0 n)) result (pmap (fn [x] (doall (map (fn [p] (let [row (rem p *width*) col (int (/ p *height*))] (get-color (process-pixel (/ row (double *width*)) (/ col (double *height*)))))) x))) work)] (doall (apply concat result))))
Chaining functions in Clojure means clarity
In Clojure, there are many functions for very few data types. Functions can also be passed as arguments to other functions, which makes chaining functions in deep learning possible. With implementation closer to the actual mathematical model, Clojure code can be simple to read and maintain.
;; pipe arg to function (-> "x" f1) ; "x1" ;; pipe. function chaining (-> "x" f1 f2) ; "x12"
Identity and state in Clojure provide safety
In Clojure, each model’s identity has one state at any point in time. That state is a true value that never changes. If the identity appears to change, this is because it’s associated with a different state. New values are functions of old. Inside each layer of the neural network, the state of the original data is always preserved. Each set of data with new values that are outputs of functions can operate independently. This means that actions can be performed on these sets of data safely or without regard to contention. We can refer back to the original state of the data at any time. Therefore, consistency, in this case, means safety.
Libraries and Limitations
Historically, the cortex machine learning library contains all you need to implement machine learning algorithms in Clojure. With the recent rising popularity of the open-source MXNet framework for deep learning, it is easier to implement deep learning using the MXNet-Clojure API.
Although there are now different APIs and machine-learning libraries available for Clojure, there is still a steep learning curve to becoming fluent in it. Error messages can be cryptic and companies will need to be willing to invest upfront to use it to scale up their machine learning systems. As more examples of production-ready systems are written in Clojure, the language will gain more popularity over the coming years, but only if the number and size of libraries accompanying the usage of Clojure grows consistently.
Haskell is a functional language that is statically typed with type inference and lazy evaluation. It is based on the semantics of the Miranda programming language and considered to be more expressive, faster, and safer for implementing machine learning.
Type safety in Haskell provides safety and flexibility
Type safety defines constraints on the types of values a variable can hold. This will help to prevent illegal operations, provide better memory safety, and lead to fewer logic errors. Lazy evaluation means that Haskell will delay the evaluation of an expression until its value is needed. It also avoids repeated evaluations, which will save running time. At the same time, lazy evaluation allows for infinite data structures to be defined. This gives a programmer unlimited mathematical possibilities.
Simple explicit code in Haskell provides clear implementations
One of the biggest benefits of Haskell is that it can describe algorithms in very explicit mathematical constructs. You can represent a model in a few lines of code. You can also read the code in the same way you can read a math equation. This can be very powerful in complex algorithms such as deep learning algorithms in machine learning. For example, the below implementation of a single layer of a feed-forward neural network shows just how readable the code can be.
import Numeric.LinearAlgebra.Static.Backprop logistic :: Floating a => a -> a logistic x = 1 / (1 + exp (-x)) feedForwardLog :: (KnownNat i, KnownNat o) => Model (L o i :& R o) (R i) (R o) feedForwardLog (w :&& b) x = logistic (w #> x + b)
Multicore parallelism in Haskell provides performance
In deep learning, typical neural networks will contain a million parameters that define the model. Also, a large amount of data is required to learn these parameters, which, computationally, is very time-consuming. On a single machine, using multiple cores to share the memory and process in parallel can be very powerful when it comes to implementing deep learning. In Haskell, however, implementing multicore parallelism is easy.
Libraries and limitations
Haskell’s HLearn library contains machine learning algorithm implementations, while the tensor-flow binding for Haskell can be used for deep learning. Parallel and Concurrent, meanwhile, are used for parallelism and concurrency.
Although there are some machine learning libraries developed in Haskell, ground-up implementations will still need to be done for production-ready Haskell implementations. While public libraries available for specific deep learning and machine learning tasks are limited, Haskell’s usage in AI will also be limited. Companies such as Aetion Technologies and Credit Suisse Global Modeling and Analytics Group are using Haskell in their implementations—here’s a complete list of the organizations using it.
Deep learning models are complex mathematical models that require specific layering of functions. Functional programming languages such as Clojure and Haskell can often represent the complexity with cleaner code that’s closer to the mathematics of the model. This leads to time savings, efficiency, and easy management of the code base. Specific properties of functional programming allow the implementations in these languages to be safer than those of other languages. As development in AI technology progresses, evaluating these languages for the needs of large-scale system-development projects in AI will become more prevalent.
This article is part of Behind the Code, the media for developers, by developers. Discover more articles and videos by visiting Behind the Code!
Want to contribute? Get published!
Follow us on Twitter to stay tuned!
Illustration by Victoria Roussel