Big Trouble in Little Quanta: A critique of complex-valued networks for NLP explainability

What do you do on vacation? Do you relax on a nice, hot sunny beach sipping on an ice cool drink? Do you travel across South East Asia with a loved one eating new and delicious foods? Perhaps you enjoy adventuring and hiking in the great, beautiful Swiss Alps? Or maybe you’re more like me, someone who is at home catching up on their machine learning papers. One paper in particular caught my full attention: CNM: An Interpretable Complex-valued network for Matching by Qiuchi Li, Benyou Wang and Massimo Melucci. In the paper, the authors propose a framework to model human language by the mathematical framework of quantum physics, specifically modelling language in Hilbert space. Similar to methods such as word2vec or GloVe which model words to a real-valued vector space, words and sentences are mapped to a complex-valued vector space. What makes the paper rather interesting however is that by using complex-valued vector spaces, (the authors suggest) phase and amplitude can relate to things like polarity, ambiguity or emotion. Essentially the framework of quantum mechanics serves as a tool for explainability for the problem of natural language question and answering. It even won Best Explainable NLP Paper at NAACL 2019.

Having studied physics in my undergrad, I was naturally drawn to the paper. The paper suggests that Hilbert space provides a good framework for model explainability and I certainly thought this was a very interesting and novel approach. However after reading the paper, I couldn’t help but to feel a bit mislead. I felt not only was there a misuse of ideas from quantum mechanics, but also a lack of evidence and (ironically) explanation for a framework which would be ultimately used for explainability.

What’s the big deal with a little quanta?

Right off the bat, I began to feel uneasy with the author’s analogies to quantum mechanics. I’ve always had an avid interest in science, so naturally misrepresentation of scientific theories bother me. What do I mean by this? One famous example is the concept of quantum healing, a term coined by famous author and alternative medicine advocate Deepak Chopra. Chopra suggests that phenomena such as sudden and dramatic healing relates to quantum mechanics and consciousness. Quantum mechanics is the framework in which we describe the physics of atoms and subatomic particles, and can only be applied to such systems. Not only is Chopra’s use of quantum mechanics incorrect, but it’s irresponsible to make claims like these, as unsavvy readers of his work who might be ill might put trust into nonsense which has no scientific basis, hoping that it will improve their health.

I want to make it clear that I don’t claim to be an expert in quantum mechanics. However I believe I’m using the proper resources and tools to help me make educated conclusions. In regards to CNM, I saw ideas that could be construed incorrectly and I felt the need to write about it.

In the data science and machine learning community, we like to write blogs or tutorials on great new ideas, but I feel as if we don’t write enough critical pieces. If we write about new ideas from more of a critical angle, I believe we can help cut through the noise, pushing forward the great ideas, in turn developing a healthy and trustworthy community. This is especially needed in a time when machine learning research is all the rage and new ideas and theories are put out faster than we can review them.

Troubling trends in exciting times

After reading CNM, it was clear that there were a lot of things about the paper that bothered me. To help me identify issues within the CNM paper, I read another paper called Troubling Trends in Machine Learning Scholarship by Zachary C. Lipton and Jabob Steinhardt. The paper describes some troubling trends they’ve observed within the machine learning research community accompanied with some examples, in addition to proposing solutions. Here is a blog post on the paper if you want to read more. In the paper, the authors outline four common trends within machine learning papers:

Failure to distinguish between explanation and speculation
Failure to identify the sources of empirical gains, e.g. emphasizing unnecessary modifications to neural architectures when gains actually stem from hyper-parameter tuning
Mathiness: the use of mathematics that obfuscates or impresses rather than clarifies, e.g. by confusing technical and non-technical concepts
Misuse of language, e.g. by choosing terms of art with colloquial connotations or by overloading established technical terms

The paper was very enlightening and I would highly recommend reading it. It should be noted that these trends are not unique to the field of machine learning. In fact, I’ve experienced these issues in my undergrad doing applied math research. As data scientists, machine learning researchers, or those who simply have an interest in the field, we should put some onus on ourselves to be critical of what we read. I understand though in a field like machine learning where new research is coming out at a great rate, we can sometimes mistake the rate of work being developed as actual progress in the field. To quote computer scientist Drew McDermott on the field of AI research [3]:

Explanation vs Speculation

In the Abstract of the paper [1], the authors state that “with well-contsrained complex-valued components, the network admits interpretations to explicit physical meanings”. What does the author mean about explicit physical meanings? The authors are talking about the components of the complex values, i.e. amplitude and phase. They elaborate further in the introduction, stating that amplitudes correspond to the lexical meaning and the phases implicitly reflect the higher-level semantic aspects such as polarity, ambiguity or emotion.

In the paper however, there are no tests or experiments to back up these bold claims. The authors return to the explanation of complex phase and amplitude in Section 6.3 in the Discussion. In this section however, the authors still propose no evidence to suggest how complex phase can represent higher level aspects such as polarity, ambiguity or emotion nor how amplitudes can represent lexical meanings. Instead the authors only go into the mathematical definition of phase and amplitude for a complex number.

This claim is passed off as if it is intuitively obvious. For a model and framework which purpose is for explainability, there’s a lack of explanation for these ideas.

Mathiness

Using Hilbert spaces for machine learning is certainly novel and exotic and I can’t help but to see it’s only that. I believe that there is a misunderstanding between the mathematical framework in which the authors are using and the actual physics which the mathematical framework was developed for. For example, just because you can apply Hilbert spaces to your problem, it doesn’t mean your system exhibits any quantum behaviour. It’s like saying that your system exhibits relativistic properties because you use Einstein notation.

This isn’t to say that you can’t use Hilbert spaces or complex number spaces for your problem, but by carrying around the quantum mechanics jargon which comes with it, readers might have an issue distinguishing the two.

It’s certainly impressive to see Bra-ket notation throughout the paper, especially to those who have little to no familiarity with quantum mechanics. However bra-ket notation is simply a convenience in notation rather than a new type of mathematics. To put it simply, it’s another notation for linear algebra and formulas stated in the paper are simply standard mathematical definitions of linear algebra: dot-products or matrix products with a “different skin”. I feel as if the use of bra-ket notation is more to attract the reader, to give the perception of technical depth rather to serve an actual purpose.

Misuse of Language

I believe that the paper misuses the language and concepts of quantum mechanics to justify the use of complex-valued spaces. In the Introduction [1], the authors state: “Intuitively, a sentence can be treated as a physical system with multiple words (like particles), and these words are usually polysemous (superposed) and correlated (entangled) with each other.”

Let’s take a moment to think about this statement. How are words or sentences in the linguistic sense analogous to physical particles like electrons? Similar to explanations from Deepak Chopra, the use of quantum mechanics is abused to fit a particular narrative. Quantum mechanics is simply used to explain physics of particles which are on the quantum scale. Making analogies like these are far-fetched and flat-out unreasonable. By using language to explain quantum physics, it implies that words exhibit quantum phenomena.

Again, I believe it’s fine to propose a framework which uses a complex-valued vector space, but it seems as if there is a blending between the mathematical framework which is used and a physics which share that mathematical framework.

The authors even state in the Introduction how “complex values are crucial in the mathematical framework of characterizing quantum physics. [Therefore] to preserve physical properties, the linguistic units have to be represented as complex vectors or matrices”. The issue is that linguistics is not a physical property nor is it quantum, so why does it need to preserve physical properties?

In defence of the authors, they reference two papers which suggest that human-cognition and language understanding exhibit quantum-like phenomena [4], [5], but I still believe it’s a stretch to relate the cognitive science ideas of “consciousness” with a mathematical framework to model language.

Final remarks

I do appreciate the author’s efforts in trying to create a tool for explainability. However in CNMs, there was too much of a focus on how well the model worked compared to current methods instead of explainability: why did it work? We’re drawn to the fancy and exotic bra-ket notation and the allure of quantum mechanics, taking our attention away from the real issues of the paper. Does it really make sense for language to behave as a system of elementary particles? What evidence do the authors provide to support such a claim?

Perhaps the use of complex vector spaces can be useful in that pursuit of explainability. Instead of complex numbers, maybe quaternions or dual numbers can help. As fancy as these things are, they are just mathematical frameworks. Quantum mechanics use Hilbert spaces to explain the physics, but that’s all CNMs and quantum mechanics have in common: the math.

References

[1] CNM: An Interpretable Complex-valued network for Matching by Qiuchi Li, Benyou Wang and Massimo Melucci (2019)

[2] Troubling Trends in Machine Learning Scholarship by Zachary C. Lipton and Jabob Steinhardt (2018)

[3] Artificial Intelligence Meets Natural Stupidity by Drew McDermott (1976)

[4] Quantum Entanglement in Concept Combinations by Diederik Aerts and Sandro Sozzo (2013)

[5] Entangling Words and Meaning by Peter Bruza, Kirsty Kitto, Douglas Nelson and Cathy McEvoy (2008)