Approximations to uncomputable universal measures of complexity
I introduced novel approaches to approximate and apply non-computable functions that exploit and unleash the full power of universal measures of complexity, that is, measures that characterize the properties of an object independent of the ways in which such an object is described. I have contributed with the only method other than lossless compression to approximate the Kolmogorov complexity of strings (a method with a stronger theoretical foundation than compression). See papers: J6, J17, J20, J22. And applications of these tools to several different areas. See papers: J16, J13, J10, J8, J4, J2, J18, J19, J21, J26, J27, J29, J25 and P10. I have provided numerical evidence in favor of Levin's universal distribution and Solomonoff's algorithmic probability bias towards simplicity. See papers: J13, J16, J22, J21, P31, P32 and P24). Collaborators: Fernando Soler-Toscano, Jean-Paul Delahaye, and Nicolas Gauvrit.
The human mind is algorithmically biased to suit specific purposes. Some of these algorithmic mechanisms can replace previously considered biases based on previous experiences (e.g. the so-called system 1). We have known that random generation tasks, in which people have to generate random-looking sequences, are linked to some important aspects of mental and cognitive capacities. We have introduced a new approach to the study of the behaviour of animals and humans based on algorithmic information theory. We have shown that humans best outsmart computers when they are tested on randomness generation at the age of 25. When competing against all possible algorithms, we have found that human behaviour is at its most algorithmically random potential when reaching 25 years of age, thereby introducing a new measure of cognitive complexity possibly related to other biological and cultural aspects, such as creativity. Indeed, humans produce more randomness than most computer programs when they are 25 years old. In parallel, we are also working on measures of integrated information to profile brain networks. Relevant papers are J33, J18, J21, J25, J8 and survey P26 Main collaborators: Nicolas Gauvrit, Peter Brugger, Jesper Tegnér, Alberto Hernández, Antonio Rueda-Toicen, James Marshall and Juan Carlos López.
Equipped with the powerful measures I have developed with my team, we tackle a fundamental challenge in science, that of developing tools for causal discovery in order to unveil design principles and generating mechanisms of arbitrary dynamic systems. In particular, the development of an interventional calculus based upon the theory of computability and algorithmic probability that identifies the key markers in genetic networks with which to steer and manipulate cell function and cell fate. The range of application of this work is very general and aims to generate effective intervention tools to steer the causal content and thus the fate of artificial and natural complex systems. I devise strategies to understand, test and reprogram artificial and natural systems as models of computation (see e.g. papers J24, J16, J9). I have also devised 2 measures for testing the algorithmicity and programmability of a system (see also in proceedings P22, P23, P15, P13). We are also introducing a perturbation-based calculus to study the information landscape of complex systems and networks capable of unveiling information on the energy landscape with a measure of algorithmic (re)programmability that connects the theory of dynamic systems with (algorithmic) information theory to reveal the dynamic landscape at the heart of the problem of causality. This should allow us to find clues to move systems from among different functions thereby effectively reprogramming them. Collaborators: Narsis A. Kiani, Jesper Tegnér et al. See papers: J26, J27 and J29, J30, J31, J32 and proceedings P26 and P27.
Spatial & Molecular Computing
While cellular automata are a type of spatial computing, I have also explored some other types of spatial computing such as Wang tiles. I contributed a method based upon algorithmic complexity to analyze and characterize the conformational spaces of molecular-computing systems, with a view to better understanding and reprogramming natural systems such as chemical and biological molecules. The Wang tiles emulate Porphyrin molecules, the molecules that give the red color to your blood. We have shown how algorithmic complexity can inform the way in which computation can be driven towards different qualitative behaviours to accomplish different tasks. Collaborators: Germán Terrazas and Natalio Krasnogor. See papers: J34, P31, P32 and P24 and J16, J21, J13 and J28.
I have written many small pieces of software explaining a concept in science, mathematics or logic. Some of my favourites are:
Public Understanding & Engagement
I also consider important to contribute to places like Quora, here are some of my favorite answers and questions:
I encourage other colleagues to do the same to fight misinformation in all its forms. Marches for science are great but I am certain that people expects more effective ways from us. In my opinion there are two main activities that we can do with immediate impact:
Massive Open Online Course
Supported by the Foundational Questions Institute, the Wolfram Foundation and the Santa Fe Institute, I am preparing a course on Information Dynamics of Complex Networks with my colleagues Narsis A. Kiani and Antonio Rueda-Toicen (Teaching Assistant). The course will draw heavily upon information theory and complexity science to explore the cutting-edge field of systems and network biology. This will be a computer-program oriented course on tools of information and computer science for computational biology to explore the surprising origins of life's complexity. Visit the course website.
Reprogramming the world: the pervasiveness of Turing-universality
While it was known that Turing universality could be achieved with extreme simplicity (in terms of resources i.e. state+symbols), it was not known how pervasive it was. Once a computer program is Turing universal it can simulate any other computer program. Our work is the first to shed some light on quantifying how pervasive Turing universality may be. It turns out that the number of these, are of density 1, meaning that basically almost all computer programs are capable of universal computation with the appropriate initial conditions. We also showed how computer programs previously thought to be essentially different, can actually simulate other computer programs of unbounded complexity. In this sense, we have shown a complete collapse of the so-called Wolfram classes in a fundamental way, but we have also shown that there is an asymmetry in the number of possible compilers accessible for a given exhaustive compiler exploration time to perform these simulations that reintroduces a hierarchy re-establishing the classes from an epistemological perspective. The fundamental collapse, however, strengthens Wolfram so-called Principle of Computational Equivalence. By way of the mentioned asymmetry we also defined a topological measure based upon simulation networks establishing the degree of difficulty to find the right compilers to make a program simulate others. All this constitutes a completely novel Bayesian approach to Turing universality and a powerful framework to reprogram system's behaviour. These ideas are also consistent across my many first views on programmability based on Turing's 'imitation game' for testing black box capabilities from input perturbation (interrogation) and output assessment. We propose to use the tools of algorithmic complexity to study such mappings as an ultimate behavioural evaluation framework. Collaborators: Juergen Riedel. See papers: here and P28.
Algorithmic Law of Computation vindicate and validate Occam's simplicity bias
Occam's razor is a principle that establishes that among competing hypotheses, the one with the fewest assumptions should be selected. There have been attempts to derive Occam's Razor from probability theory, notably by Harold Jeffreys and E. T. Jaynes. Using Bayesian reasoning, a simple theory is preferred to a complicated one because of a higher prior probability; however it leads to a circular argument. Solomonoff and Levin's approach, on the other hand, provides a non-circular sound justification for the prior, favouring simplicity over complexity by assuming a recursive generating mechanism. The main idea is that if a computer program is a generating cause for some phenomenon, the shorter the program the greater its likelihood of being produced by chance than a larger one. Yet, Occam's principle is simply encoded or transformed into favouring shorter computer programs over longer, something that has triggered some criticisms (e.g. Kevin Kelly work at CMU). This is because it was unknown whether the theory of algorithmic probability corresponded to reality. Which is where I came in, demonstrating that, when taking all programs up to a fixed size, the most likely computer program that produces an output is also the shortest one. This means that, indeed, determinism (recursiveness) alone leads to the confirmation of Occam’s razor, and it shows the power of using computation in philosophy. My research reveals a far reaching result showing real empirical evidence of the validity of Occam’s razor as a direct consequence of a purely mathematical, rather than a philosophical, principle, one that only requires---as predicted by algorithmic probability---determinism as a sufficient condition in the form of algorithmic computation based upon a Bayesian model, with Solomonoff and Levin's Universal Distribution as a prior. See papers: J13, J16, J22, J21, P31, P32 and P24.
Network Cosmology & Fine-tuning
The emergent properties of the universe are traditionally attributed to special values of fundamental constants. The universe in its present state is lumpy with structures such as stars and galaxies, black holes and life, at least according to our best models based on the available observable data. But while the presumed early state and the projected late universe are similar in that they are both characterized by minimal entropy, this is so for different reasons. More importantly, structures prevail and entropic histories allow order hitherto only explained by anthropic principles. Our current on-going project explores these questions by performing computer simulations to analyze the initial conditions and entropic evolutions of computational universes. We test whether habitable zones of increasing entropy and free algorithmic complexity where structures like life may exist in candidate universes are common and robust or rare and fragile. Our approach tackles a foundational question in modern physics related to cosmology and complexity theory (and other areas such as network biology where causal networks of interacting units are fundamental) concerning the specificity of our universe and the initial conditions that were needed to give rise to and sustain structure, life and intelligence. Collaborators: Jurgen Riedel.
The Busy Beaver Conjecture
I have proposed what I call the Busy Beaver conjecture, establishing that all Busy Beaver Turing machines (with more than 2 states) are candidates for Turing universality. This is because these Turing machines are the deepest in the Bennett sense, they are able to halt but also to display the greatest algorithmic complexity. I came up with a new Bayesian approach to gather evidence in favour of Turing universality by way of intrinsic universality (which is a stronger form of Turing-universality) by behavioural qualitative reprogramming. Collaborators: Juergen Riedel. See papers: J7 and P28 and a forthcoming one currently in the ArXiv.
I am a strong supporter and pioneer of a strong form of Computational/ Experimental Philosophy (a weaker version has been suggested, consisting of viewing philosophy through computation and vice versa). Almost every paper of mine follows a common structure starting from a general philosophy-of-science question. Then I perform numerical experiments based upon small computer programs as models of the phenomenon in question. Finally, I base my conclusions on the experimental results obtained.
I have strong interests in topics at the intersection of computation and philosophy, such as simulation, reality and fine-tuning, all of which I pursue by performing actual numerical experiments with computer programs as possible models of the world. I have undertaken a 'deep space' exploration of the computational and mathematical universes. You can find this approach in almost every paper of mine.
Symmetry Breaking via Computation: An Algorithmic View of the Universe
I have shown that the emergence of structure and symmetry breaking can be derived from perfectly symmetric rules. Indeed, symmetry breaking is key not only to connecting e.g. Turing's theory of universal computation with his theory of biological pattern formation--morphogenesis as he saw in the breaking of symmetry the key to pattern formation-- but to also explaining how this symmetry breaking emerges from the perfect symmetry of deterministic rules via the highly asymmetric Universal Distribution (Levin, Solomonoff) in the way in which computable outputs distribute, giving meaning to the concept of randomness versus simplicity. I illustrated the way in which algorithmic probability differs from classical probability theory. Foundational and technical papers include: J24, J13, J7, and J6.
A hierarchy of programmability
Along these lines, I came up with a hierarchy of programmability based on a variability and controllability analysis that in turn is based upon a novel approach to testing qualitative behaviour using a form of an 'algorithmic Turing test' with a universal compressor evaluating the output of other computer programs seen as black boxes. See papers: J9, P28, P22, P17, and P13.
Algorithmic Data Analytics
Because data of interest is usually, if not always, produced by mechanistic and algorithmic causes, I have proposed an empirical Bayesian approach to science where uninformative priors are not uniform flat (said uninformative) distributions but Levin's so-called Universal distribution based on algorithmic probability (Solomonoff). I am thus introducing different tools and methods than those based on (statistical) machine learning and Shannon entropy which are more likely to produce false models from spurious correlations specially in the advent of Big Data. To this end I am introducing algorithmic-information tools better equipped to model and deal with the discovery of first and design principles and generating mechanisms (causes). See papers: J18, J20, J26, J27, J31, J31, J32 and P29, P30, P26.
The Vermont & Bloomington Roundtables
I organized and moderated what I think will historically be among some of the most interesting public discussions of some of the most brilliant minds in the science of the digital. The two roundtables included thinkers such as Gregory Chaitin, Stephen Wolfram, Edward Fredkin, Anthony Leggett, Rob de Ruyter, Tommaso Toffoli, Karl Svozil, Cristian Calude, and Paul Davies. Transcripts of the roundtables can be found in my edited volumes Randomness Through Computation and A Computable Universe. I am in debt to Adrian German for his help in co-organizing with me one of the roundtables.
Konrad Zuse's Calculating Space Revision
With Adrian German (Indiana U. Bloomington), I rewrote (and made some corrections) to Konrad Zuse's Calculating Space (Rechender Raum), with permission from Zuse's eldest son (and from MIT who made the first English translation), Horst Zuse, with whom I had the chance to meet and dine with in Berlin, Germany in 2006. The file can be found in the ArXiV here and in my A Computable Universe book.
Graph and Tensor Complexity
I have introduced native measures of universal complexity i.e. independent of description language and selected feature(s). This was by way of algorithmic probability based on a generalization of a universal n-dimensional Turing machine rather than the a unidimensional tape. I have shown powerful properties of the measure, correlated to intuitive and formal symmetries, and other algebraic and even physical properties. My approach is a solution to the challenge of graph complexity previously identified by Gell-mann and Feldman as an open problem. See papers: J19 and J29. Collaborators: Fernando Soler-Toscano, Santiago Hernández, Antonio Rueda-Toicen, Narsis A. Kiani, Kamaludin Dingle, and Ard Louis.
To illustrate the problem of ad-hoc measures I constructed a network whose node degrees follows the so-called Champernowne artificially-constructed Borel normal constant thus indicating maximal entropy (and maximal entropy rate) by construction, though its edge density asymptotically approximates zero and thus the entropy rate of its adjacency matrix also converges to zero, in contradiction to the entropy rate of its degree sequence. So we presented a graph that, depending on the choice of description, has the same maximal and minimal information-content according to Shannon entropy for the same uninformative choice of distribution (uniform) in both cases, thereby establishing the fragility and description-dependent nature of Shannon entropy. Collaborators: Narsis A. Kiani. See preprint here.
The Online Algorithmic Complexity Calculator (OACC)
The OACC is a powerful free online calculator designed to provide estimations of algorithmic complexity (a.k.a. Kolmogorov-Chaitin complexity) and Algorithmic Probability for short and long strings, and for 2-dimensional arrays better than any other tool, and estimations of Bennett's Logical Depth (or LD) for strings hitherto impossible to estimate with any other tool. These 3 measures constitute the most powerful and universal algorithmic measures of complexity. It uses a method (called BDM) based upon Algorithmic Probability, which is compatible with---but beyond the scope of---lossless compression algorithms that are so widely used to estimate algorithmic complexity. Implementations of lossless compression are, however, entirely based upon Shannon entropy (e.g. LZ, LZW, DEFLATE, etc) and thus cannot capture any algorithmic content beyond simple statistical patterns (repetitions).In contrast, our BDM approach not only considers statistical regularities but is also sensitive to segments of an algorithmic nature (such as in a sequence like 12345...), which S and lossless compression algorithms would only be able to characterize as having maximum randomness and the highest degree of incompressibility. Moreover, unlike K (thanks to the Invariance Theorem) both Entropy and Entropy-based compression algorithms are NOT invariant to language description and are therefore neither suitable nor robust as measures of complexity. The OACC has been featured in the Newsletter of the London Mathematical Society and can be accessed here.
Computation and Causation
I use computer programs as models of the world. Determinism from classical mechanics implies that everything can be seen as a computer program (quantum mechanics may not be an exception; but we just don't know). I began computing the uncomputable, with numerical approximations to universal measures and we are now introducing a powerful calculus and axioms of causation based on algorithmic probability. Collaborators: Jesper Tegnér, Narsis A. Kiani, and Yanbo Zhang. See papers: J18, J20, J26, J27, J31, J31, J32 and P29, P30, P26.
Cellular Automata, Emergence & Undecidability
I have contributed with formal behavioural classifications based on algorithmic complexity and asymptotic behaviour, giving answer to one of Wolfram's open questions on the fraction of behaviour classes in CA rule spaces (see paper J15), a classification by compression (paper J1), interaction of CAs (see paper P24 and more forthcoming) and a new proof of universality in Elementary Cellular Automata (ECA) (to be available soon). Other papers on CA: J34, J12, J22 and J24. We have numerically explored the concept of computation to enrich the philosophical discussion about its origin and implications. See paper: J5.
Open-ended Evolution (OEE) and Algorithmic Creativity
We have explored how non-trivial deterministic systems can display unbounded change in complexity. We have proved that undecidability is essential for legitimate forms of OEE definitions and that this is also deeply connected to open and closed systems, with self-referential state-dependent systems displaying a greater degree of versatility. For OEE to be possible (under reasonable definitions), then open and undecidable dynamic systems are necessary. Main collaborators: Santiago Hernández, Alyssa Adams, Sara I. Walker, Paul Davies, and Joost Joosten. See papers: J34, P31, P32 and P24.
Tradeoffs of Complexity Measures
I am interested in how measures such as entropy, algorithmic randomness, logical depth, algorithmic probability, time complexity and fractal dimension may be related to each other. Most measures are statistical in nature, such as entropy. We have found, for example, trade-offs between program-size and time computational complexity, and between time (process) complexity and fractal dimension. We have also shown the limitations of computable measures of complexity, such as Shannon Entropy. See papers: J28, J17, J3, J15, J23, J22, J1 and proceedings P4, P9, P25 and P10. See also important preprint here. Collaborators: Joost Joosten, Fernando Soler-Toscano, Narsis A. Kiani.
Semantics & the Computational Value of Information
Concerned by the semantics of information, my collaborators and I have looked into different aspects of value in producing and using information for computation. Information is relative; it only makes sense as mutual information. It is indistinguishable from meaning as opposed to (algorithmic) randomness and how different measures of complexity relate to each other and how can they be classified in major groups also provides an all-encompassing natural classification of information. I am helping to understand the meaning of meaning and information in the context of algorithmic complexity. We have found that strategies implemented in automatic theorem proving involve an interesting tradeoff between execution speed, proving speedup/computational time and usefulness of information. And that theorem provers are subject to the same non-linear tradeoff between time and size as computer programs are, affording the possibility of determining e.g. optimal timeouts and waiting times in mathematical proving. Moreover, we have substantiated the way in which intelligent animals make use of the structured nature of the world to extract information and develop/evolve learning mechanisms.
Collaborators: Joost Joosten, Wilfried Sieg, Santiago Hernández and Francisco Hernández-Quiroz, James Marshall. See papers J10, P10 and preprint of forthcoming paper here.
Contributions to Industry
I have contributed some built-in functions to the core kernel of the Wolfram Language and Mathematica, functions mostly related to computational linguistics such as WordData (based on Princeton's WordNet), LanguageData and DictionaryLookup. I have also contributed to the creation and development of Wolfram|Alpha, the computational knowledge engine, on aspects of languages and linguistics, theorem proving, automatic data curation, knowledge discovery and fast prototyping. I was part of the first original team that developed Wolfram|Alpha working with only 4 or 5 other people in a small office in Cambridge, MA in Stephen Wolfram's Special Projects group.
Current and future directions
Current and future directions of research include: algorithmic feature selection and model generation; connections between dynamic algebraic (spectral) graph theory and algorithmic information (see paper P29); non fine-tuned models of causal networks; and applications of our algorithmic calculus to disentangling interconnected networks.
Creativity & Cooking
In my quest to understand the process of, and the sources of creativity, I have explored apparently disparate areas, such as cooking. I have taken courses at the Cordon Bleu in Paris and London, at Brookes University (wine first certificate), at the Raymond Blanc Cookery School in Oxford, and at private courses in Tokyo, Japan. A good chef is a cook that becomes a scientist in the kitchen, mastering the right ingredients between precision and experimentation in the process of innovation.
Key concepts from my research explained in
a computable piece of uncomputable art:
This picture, titled “Visualizing the Computational Universe: Runtime Space in a Peano Curve," won me 1st. prize in the Computational Imagery category of the "2011 Kroto Institute Scientific Image Competition" held in the UK. Here is the explanation: It represents a small slice of the computational universe of all possible computer programs.
Each pixel represents a computer program with empty input running on an abstract computer (a Turing machine). The color indicates the runtime of the program upon completion of its computation. The darker a pixel the longer the program took to produce its output. White pixels are computer programs that do not produce any output because they never halt (just as happens when your computer stalls forever, sometimes making you reboot).
Red pixels are the computer programs that take the longest time—relative to other programs of the same size—before finishing (they are also called Busy Beavers). I call this color type coding the computability spectrum. Programs were sorted by size from smallest to largest (in terms of number of instructions) in such a way that one could be certain that any program would eventually be reached (including versions of MS Windows and Angry Birds that can be encoded as one-run computer programs).
Because there is no upper limit to the size of programs, the space of all computer programs is infinite but discrete, perhaps mirroring our own world. Because an enumeration of computer programs is linear (you start with program 1, then proceed to program 2, and so on) pixels depicted in this 2-dimensional figure are disposed along a "space-filling" Peano curve. This is a curve that guarantees that 2 contiguous elements in the enumeration remain as close as possible to each other in the 2D configuration.
The picture has the striking particularity that it is ultimately uncomputable. Imagine you point a telescope and take a picture of an object in the computational universe for which some pixel values cannot be determined, not only in practice but in principle-- that is, no matter how good your camera, or how many terapixels some future version of it may have. This slide of the computational universe was only possible because it is a tiny region of small computer programs for which all pixels can be determined by zooming in deep enough.
However, Gödel and Turing proved that if you zoomed out to a larger view of the computational universe, the picture would begin to look completely blurry because pixel colors would not be determinable. In other words this picture is the inverse equivalent of Hubble's Deep Field space--the larger the space field the blurrier.
While we can only have access to the visible universe determined by the speed of light, we can only see a small part of the computational universe due to similar limitations both physical and computational (it seems we cannot build more powerful computers than the ones we already have to overcome these limits). Hence, most of the computational space will remain unknowable, not only due to a limitation of resources but due to a fundamental barrier established in the digital world and extending into our own reality.
Only incremental improvements to this picture are possible. While the picture is ultimately uncomputable and unattainable, one can make an informed guess as to what it would look like if zooming out. The theory predicts the average color of the picture at a greater scale, this theory being the theory of algorithmic probability, closely related to what is known as the Chaitin Omega number, or the halting probability. It turns out that the picture will be mostly white, as in an ever expanding universe, finding computer programs that halt and produce something will be increasingly harder. Indeed, most programs will never halt, and those that halt will do so relatively quickly in proportion to their size, as Prof. Cristian Calude has theoretically postulated and I have numerically confirmed numerically (see paper P10).
Algorithmic probability also tells us that most programs will produce an output following exponential decay (-log_2 Pr(s)) hence most of them will produce the same strings and the fewest will produce algorithmically random digital sequences. In other words, among the computer programs that do produce an output, most of them will be highly ordered, as if they were creating a structured digital universe of highly ordered patterns out of nothing (just like ours). Some of these programs that produce the same output may do so faster than others, but precisely how much faster a program can be is another open foundational question and one of the greatest challenges in mathematics and computer science. If it were possible to find a program considerably faster than any other, then everything could be computed in 'reasonable' time, regardless of the apparent difficulty of the computation.
A full technical explanation of some of this and of how the image was generated was published in this paper:
H. Zenil, From Computer Runtimes to the Length of Proofs: With an Algorithmic Probabilistic Application to Waiting Times in Automatic Theorem Proving In M.J. Dinneen, B. Khousainov, and A. Nies (Eds.), Computation, Physics and Beyond International Workshop on Theoretical Computer Science, WTCS 2012, LNCS 7160, pp. 223-240, Springer, 2012.
A preprint is available online here.