This page is in the section ⚒ Bayesianism is an eternalism,

which is in Rationalist ideologies as eternalism,

which is in Non-theistic eternalism,

which is in Eternalism: the fixation of meaning,

which is in Meaning and meaninglessness,

which is in Doing meaning better.

☞ The next page in book-reading order is ⚒ Utilitiarianism is an eternalism.

This page’s topics are History of ideas and Rationalism.

General explanation: Meaningness is a hypertext book (in progress), plus a “metablog” that comments on it. The book begins with an appetizer. Alternatively, you might like to look at its table of contents, or some other starting points. Classification of pages by topics supplements the book and metablog structures. Terms with dotted underlining (example: meaningness) show a definition if you click on them. Pages marked with ⚒ are still under construction. Copyright ©2010–2018 David Chapman.

## Comments

## The marriage of predicate

The marriage of predicate logic and probability is indeed an ongoing problem. Most current work in Artificial Intelligence focuses on quantifying over probabilities. Partially due to neglectedness, and partially because we suspect that quantifying over probabilities is not going to end up a good approach, MIRI’s workshops on logical uncertainty have tended to attack the problem from the opposite direction, assigning probabilities to logical statements, e.g:

But none of this is challenging the Kolmogorov axioms or denying Cox’s Theorem, because once you assign probabilities to logical formulas, those probabilities obey the standard probability axioms. It might be better to say that the theory of uncertain reasoning extends the theory of certain reasoning, if you object to the phraseology which says that probability extends logic.

Another obvious example of an element of rationality that’s not contained in the Kolmogorov axioms for probability is, of course, the prior. But since logic never addressed priors either, this doesn’t mean that the theory of uncertain reasoning fails to extend the theory of certain reasoning.

What you’re really talking about, I think, is a mixture of the problem of extending our priors to logical theories that might be true of the empirical world, and the problem of relating logical theories to the empirical world at all. If I knew that, given some logical set of beliefs, it was 60% likely for a coin to come up heads, and then I saw the coin come up tails, I would know what to do with that. The problem is all in going from the logical set of beliefs to the 60% probability of seeing the coin come up heads. But if you can’t follow that link, you don’t have a problem with “extending logic to probabilities” so much as you have a problem with “relating anything phrased in predicate calculus to an experimental theory that makes predictions”. Even if I could tell you the probability of the Goldbach Conjecture being a semantic tautology of second-order arithmetic after updating on observation of the first trillion examples being true, I might not have solved the problem of going from any set of axioms phrased in predicate calculus to an experimental prediction. You’re free to say of this that “probability can’t extend logic!” but I wouldn’t describe that as being the main issue.

An AI-grade solution to this problem, I suspect, will tackle that issue head-on and give a naturalistic account of physical words that obey logical axioms, assign priors to those different observers, and try to locate observers or agents inside those worlds. But this is not the same problem as assigning a probability to the Goldbach Conjecture.

Jaynes clearly didn’t get everything right. For example,

Probability Theory: The Logic of Sciencediscusses Jaynes’s disbelief in the Copenhagen interpretation (justified) but it’s clear that Jaynes has not understood the import of the very standard argument from Bell’s Theorem which rules out Jaynes’s suggested resolution. Even so, the part of the argument where probability extends logic and (the repaired forms of) Cox’s Theorem gives us reason to suspect we won’t find any other way to do it, seems quite clear and cogent to me; even more so when you consider that in decision theory we have no good alternatives to utility functions and utility functions want scalar quantitative weights on outcomes, so useful uncertainty has to be a scalar quantity. I think you would probably just be happier with the wording if we said “the theory of scalar uncertainty extends the theory of qualitative belief and disbelief, and there are no good alternatives to scalar probabilities when it comes to AI-grade theories of uncertainty”.## Rationality is hard

Hello, and thank you for the reply!

Yes, that’s better—it is, at least, not known to be mathematically false, unlike the original! I think it’s a reasonable hypothesis, and rich enough to be a fruitful driver for a research program.

However, since we don’t have a theory of either certain or uncertain reasoning that is either normatively or descriptively adequate, “uncertain extends certain” can’t be more than a hypothesis at this stage. It may be that different methods are preferable in the two domains. (And this may be true whether we’re talking normatively or descriptively.) This is, in fact, the

currentstate of the math: use predicate calculus for certain reasoning, probability theory for uncertain reasoning, and no general combined theory exists.You are committed to probabilities as a representation of uncertainty, and in that framework the hypothesis seems likely. I have built AI systems that operate effectively in highly uncertain environments without making any use of probabilities, and that makes the hypothesis seem less probable to me. (My confidence is low enough that I wouldn’t want to choose even a rough subjective probability that “uncertain extends certain” is true, however.)

Probably not. The rest of this paragraph mostly doesn’t relate to what I was saying at all, as far as I can tell. I did mention in passing that relating logical theories to reality is an issue, but that wasn’t central to my point.

The main point is simply that probability theory does not extend logic. Some people have believed that it does (and have cited Cox’s Theorem as a proof). This is unarguably, mathematically, false. I don’t know how prevalent this wrong idea is. I put off finishing this piece for several years because it’s not clear it was worth the time it took—if few people are confused that way. You say, in your last paragraph, that “[Jaynes’] argument where probability extends logic … seems quite clear and cogent to me,” so I’m not sure you whether you are clear on this yourself.

“Probability theory does not extend logic” is a small part of a broader analysis of what’s wrong with current formal approaches to rationality. This page made various hand-waving allusions to the broader story, without much support. I hope to write up more of it at some point.

Good luck with that. I spent enough years working on it to be confident it’s not going to go anywhere—but everyone has to find that out for themselves, perhaps.

I’m not sure I understand the “but” here; are you suggesting that something I said suggests that would think it was the same? I don’t.

It would be helpful to anyone reading your writing if you said “probability extends propositional calculus” instead of this, because what you wrote here is mathematically false and misleading. I hope you understand the vast difference; not everyone else does.

No, currently it does not. There is no known scalar theory of uncertainty that extends the best available normative theory of qualitative belief and disbelief (i.e. predicate calculus).

You may have a

hypothesisthat some unknown future scalar theory of uncertainty extends predicate calculus, or that it will extend some other, better theory of qualitative belief and disbelief. I know of no specific evidence for that (although we are all allowed hunches, of course).I don’t know what you mean by “AI-grade.” Since (as mentioned) I’ve built AI systems that operate rationally in uncertain environments without use of probabilities, I’m somewhat skeptical, however.

## Ask the professor . . .

Thank you for this post. I have not had so much fun with math since I was a freshman, when I checked out the Raymond Smullyan logic puzzles from the college library. I would have loved to study logic and computer programming, but obviously not as much as I liked my fuzzy liberal arts curriculum.

I hope you will not send me to the corner, wearing a pointy hat, for asking “Logic 101” questions in the graduate forum. I spent a few hours reading “Probability theory does not extend logic,” and I would like to run through my layman’s conclusions based on your article. Then you can tell me how badly I’m astray.

First, I understand that your practical interest in logic is in designing AI computer systems. (I hope at least that assumption is correct.)

You start your post by referring to the five traditional bases for knowledge: Empiricism, rationalism, tradition, scripture, and intuition. In terms of artificial intelligence, “empiricism” would correspond to “observation” or “primary data input” (like through a camera). “Rationalism” would correspond to “logic” or “mathematical operations” (whether the system used is propositional calculus, predicate calculus, or some uber-system of FTL generalized rationalism that does not yet exit). “Tradition” would correspond to propositions that have been taught or placed into memory as programming instructions. I’m going to ignore “scripture,” because it’s a subset of “tradition.” I’m also going to ignore “intuition,” partly because the cognitive people seem to think that “intuition” is a shortcut we use for logic, based on pattern recognition and maybe statistics, because human beings don’t have infinite processing time to make truly logical decisions. Like that old joke (“we’re late because Dad took a shortcut,”) to me it doesn’t make sense to worry about intuition until we know where we are on the “logic” map.

When you compared probability notation to predicate calculus, you made the point that predicate calculus is more powerful because it allows logical quantification, instead of relying on implicit generalizations. This reminds me of some examples from the study of human language.

First example: In high-school English, your teacher may have made you diagram sentences, and jumped on you for using pronouns that don’t clearly point to the referent object. “Lindbergh was talking to Einstein, then he flew away.” You may infer that Lindbergh was the “he” who flew away, because Lindbergh was an aviator, but this is bad grammar because you haven’t logically quantified the “x.”) People manage to communicate all the time using bad grammar, but sometimes misunderstandings do arise; and when they do, we resort to higher-level logic or grammar to sort things out.

Another example from the study of human language: Baby talk is thought to be a natural or organic precedent for formal language; that is, human infants develop a simplified form of communication (“baby talk”) before they master formal grammar. Baby talk doesn’t have much logical quantification, but it is possible to understand what your infant is trying to communicate through “abuse of notation and intelligent application.” For example, the joke tee-shirt you can buy on the web: “Let’s eat Gramma … commas save lives.” Propositional calculus (even extended by probability theory) could be seen as a form of baby talk, which allows people to swap useful information before they understand predicate calculus. By the same token, maybe predicate calculus could be seen as the “baby talk” preliminary to a more highly developed system–“probabilistic” (or maybe “proba-ballistic”) logic–that would allow us to reason more precisely about the probabilities of probabilities. (To use your FTL analogy, we might not hit warp speed with an incomplete theory of rationality, but at least we would reach near-Earth orbit.)

(By the way, I appreciated your borrowing all the “snarks” and the “boojums” to bring this more down to my level.)

Okay, let’s go back to the tripod of “empiricism,” “rationality,” and “tradition.” Sticking with our metaphor of a human child learning language, what comes first is the physical plant, or the embryo. In a way, biology is designed by code just as surely as any computer brain, because everything is encoded in a DNA “language” of C-T-A-G, and the DNA blueprint is built up into a physical structure. The physical structure of the brain allows it to perform certain “logical” operations. In addition to a brain, the embryo also develops eyes, ears, and sensory nerve endings—these are its “inputs,” which allow it to receive new data. In terms of AI, it doesn’t really matter if you’re talking about synthetic biology, or something made of silicon—you still end up with a “brain” and “inputs,” however different these might look depending on the technology.

Now that we have a physical plant—brain to perform logical operations, and sensory inputs to receive information—we are able to add two more legs to our tripod–“empiricism” and “tradition.”

Use the variable “Pr” to refer to “programmer” or “parent” (not “P,” so as to avoid confusion with “Probability”). Use the variable “C” to refer to either “computer” or “child” (I hope “C” is not arbitrarily linked to something else that would throw a monkey wrench into my examples). Use the variable “E” to refer to “environment” or “entry”–in other words, “E” is shorthand for whatever empiric experience is thrown into the mix, in the particular situation. Maybe we should also have a variable “L,” which would refer to “language” or “logic” interchangeably.

The first inputs received by “C” are basic stimuli (light, dark, heat, or single-syllabic utterances). “C” comes equipped with logic potentialities, but no actual programming, and it takes a while for these stimuli to be encoded into logical propositions (“mom = food.”) Presumably, this encoding involves what you called “statistical inference,” which allows reasoning from specifics to generalities.

However, there is more to developing “L” than just making statistical inferences based on “E.” Specifically, “C” receives inputs from both “Pr” and “E,” and both types of input will affect the development of “L.” If you’re trying to find an exception, where “Pr” does not apply, the stereotype of the “wild man in the woods,” or the “boy raised by wolves,” might come close. In this scenario—where the child has no parents–“L” is developed only through inputs from “E,” since there is no “Pr” in the equation. (Okay, this example doesn’t really work. Wolves are good parents—they teach valuable hunting behaviors, and they even have some language ability, as expressed in the form of howls. I just brought it up to show the importance of “Pr” in real-life situations.)

In the development of intelligence, the function of “Pr” is to introduce traditions—or “logical propositions”–to “C.” In the case of a human child, these “traditions” are supposed to socialize the child and prevent it from getting hurt. (“If you hit your sister, you will go to bed without supper.” “If you wash the dishes, you may watch TV until bedtime.”) These “traditions” include both form and content. The “if, then” logic form is conveyed to “C” by example, while the content (value judgments, rules, and consequences) are passed on explicitly.

Very often, “Pr” wishes these “traditions” to be accepted as “scripture”–that is, as something which may not be questioned. However, “C” is often presented with situations in which “E” contradicts “Pr.” In assessing these situations, “C” has to learn to assign probabilities to the different statements which “Pr” might make. Take these statements as examples:

Proposition 1: “If you happened to be looking out the window, you would see Santa land on the roof with a bag of presents.”

Proposition 2: “If you touch the stove, you will get burned.”

Let’s take the case where “C” tests Proposition 1, by secretly staying up all night on Christmas Eve. Santa does not appear. The reliability of “Pr” is called into question. “C” does not have sufficient empiric data to assess the truth-value of Proposition 2. “C” also lacks an advanced understanding of probabilistic logic, and is still using some kind of “baby talk” (propositional calculus) where things either “are” or they “aren’t.” By incorrectly applying logic (false syllogism), “C” may conclude that the truth-value of Proposition 2 is “false”; or “C” may correctly infer that further empirical data is needed to make an estimate of probability that “Pr” is reliable in any given case.

The result? “C” touches the stove and gets burnt, which is an example of empirical learning, and also provides some statistical data for assessing the truth-value of propositions put forward by “Pr.”

(Okay, right: GIGO. Garbage in, garbage out. If “Pr” is statistically unreliable, then “C” will have trouble learning to apply logical propositions to empirical data in such a way as to yield positive outcomes. In other words, “C” will be poorly socialized. In other words: “Like father, like son,” or “The apple doesn’t fall far from the tree.” The HAL computer in “2001: A Space Odyssey” went crazy because its programmers taught it to lie.)

I know these ideas won’t help you to perfect a mathematical formulation of rationality. I hope I haven’t wasted too much of your time! But I did want to say, I found your ideas tremendously helpful in clarifying my own thoughts about language, logic, and artificial intelligence. Thanks again for sharing on the web, in bite-sized pieces that even a non-expert can find time to digest.

## Who says all math can be expressed in predicate calculus!?

“All math can (in principle) be expressed in predicate calculus”

That seems like a crazily overconfident, eternalist statement to me.

What about theorems in intuitionistic logic, which lose shades of their meaning if you force them into a boolean world?

What about modal logics?

What about type theories? Are you so sure that no type theories exist that are worth calling “math”, but cannot be interpreted in a predicate logic?

What if someone does figure out a way to include probabilistic uncertainty natively into logic? This would be an achievement which is not the same thing as doing probability theory as a kind of measure theory under ZFC!

## The power of predicate calculus

Hmm, this was meant as a statement about the current state of mathematical knowledge, not an Eternal Truth (and was also quite off-hand, not a significant point in the argument).

As far as I know, however, it is in fact true. I would be really interested if it turns out not to be!

Generally, non-standard logics can be embedded in predicate calculus as a set of axioms, such that a deduction is valid in the axiom system iff it is valid in the non-standard logic. This adds a layer of complexity, so it’s not useful to do it in terms of practical deduction, but it can be useful as a way of making sure you understand the non-standard logic.

You can certainly do this with modal logics; that’s what Kripke semantics is about.

I never studied intuitionistic logic formally, and it’s been 30 years since I read much about it informally, but from what I remember I’m pretty sure you could do the same thing with it. (If not, I would be interested to read about that!)

I know even less about type theories; they weren’t well-developed when I did logic. Again, if some can’t be embedded in predicate calculus, I’d be interested to learn about that.

Agreed; no two things that are different are the same. I don’t think this would contradict the claim that “All math can (in principle) be expressed in predicate calculus,” however.

I said “in principle” because usually it’s a bad idea. It’s mostly not a good idea even if you are doing, say, functional analysis, where it’s entirely uncontroversial that you

cando it in principle.## embedding, interpreting and expressing

Thanks for the response.

There’s a lot of really interesting territory contained in the question of what it means for one language to be embedded or expressed in another. For example, by the standard of embedding you gave ZFC is embeddable in PA, by writing a sentence φ of ZFC as “ZFC proves φ” in PA. But we don’t say PA can express everything ZFC can express, because if ZFC proves some statement about integers, PA can say that PA proved it, but not believe it’s true about the “real” integers.

So while it’s true that some modal or intuitionistic theories can be fully expressed inside first order theories like ZFC as statements about Kripke models, I don’t know if there’s any way of interpreting a more powerful intuitionistic theory like the Calculus of Constructions into ZFC without either treating it as a language game, or squishing it down into a boolean theory.

Here’s a really neat paper that shows how CoC + universes can be interpreted in ZFC + inaccessible cardinals and vice versa. It squishes intuitionistic propositions into True and False though.

http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=237B5B116459B31…

Curtis Franks has a really interesting discussion of these sort of questions related to provability in weak theories of arithmetic in http://www.cambridge.org/us/academic/subjects/philosophy/philosophy-scie…. The rest of the book is great too.

So yea, “can be expressed*”, but the asterisk reads “actually what it means to express something from one language in another language is really subtle and interesting in its own right”.

## really subtle and interesting in its own right

Thanks! Franks’s book looks really interesting.

More generally, I’ve always been concerned with the issues in the foundations of mathematics, and consider them somewhat relevant to the general topic of the Meaningness book. I have some draft material about that which will go in at some point (if I don’t die first).

## What about fuzzy logic?

Isn’t Fuzzy Logic something similar to probability theory that can be used to reason about uncertainty?

Especially given the engineering question, to quote:

“Are there times when we should use one of the alternatives, instead of probability theory?”In practice it is probably often easier to implement simple Fuzzy controllers rather than trying to model the involve uncertainties using probability theory.

## Fuzzy control systems

Yes—maybe—I don’t know a lot about them. There was a wave of hype in the 80s, and I haven’t heard much since. Maybe I just haven’t been listening in the right places?

## Fuzzy Online Places

Like shoulder pads and the cone bra the fuzzy hype did not make it much past the 80s, but the field carried on regardless.

On the engineering side fuzzy control became just another tool to deploy, and it found application in a wide variety of use cases.

This paper presents a nice representative cross section: http://www.researchgate.net/profile/Hani_Hagras/publication/267097192_T2…

On the theory side the underlying mathematics has been made quite rigorous: http://www.mathfuzzlog.org/index.php/Handbook_of_Mathematical_Fuzzy_Logic

My Bayesian friends really don’t like Fuzzy logic. Arguing that Cox theorem means there’s no need for it, but to me this seems to be based on a profound misunderstanding.

## a profound misunderstanding

Yes—that is indeed a profound misunderstanding of what Cox’s theorem says.

Bayesianism is a religion, unfortunately. Often cultists invoke Cox’s theorem as a holy text which proves that BAYES IS THE ONE WAY TO TRUTH AND UTILITY FOR ALL. I wrote this piece to point out that it doesn’t say that.

I thought shoulder pads were ridiculous when they came out—my girlfriend had them—but they look good in retrospect. I’d like to see a revival.

## Errors in scientific practice

I’m interested in your observation that misunderstanding the relationship between logic and probability leads to logical errors in scientific practice and to probabilistic methods being misapplied. It chimes in with some thoughts I had, for example about how the concept of a confidence interval is very commonly misunderstood and misinterpreted. Do you have any other examples of what you mean, or perhaps some references you can point me to? Much appreciated.

## Some help to a non-STEMer

I tried to make it through this page, but a lack of a STEM education makes this difficult going. I’m hoping you can help tell me if I’m on the right track. My understanding is something like this:

Both logic and probability theory can be used to assign certainties to atomic claims (“that bucket is full of water”) and use those certainties to derive measures of certainty for compound claims (“that bucket is full of water and also red”). Probability theory can make finer distinctions in certainty than logic can.

However logic can also assign certainties to universal claims (“all buckets are red”) and existential ones (“some buckets are red”). Probability theory cannot do that in general. Either you’d have to treat universal statements as infinite conjunctions (and then when you multiply the probabilities you’d get 0?) or else it’d be unclear what to do with a universal statement with a probability other than 1 or 0 (if there’s a 50% chance that there exists a red bucket, what does that tell me about any particular bucket?).

One way I’m trying to move this toward things I understand better is thinking about other “systems”. For example, is the Marxist account of economics true? What about Friedman’s account? Both make universal claims about all economies (of a given sort); both have an internal logic and allow inferring some propositions from others. How do we describe our certainty in them? It seems ridiculous to say that either is “true” or “false”, but it also seems ridiculous to say that one is, say, “30% true”. A description of our certainty in one of these intellectual frameworks would have to describe where it applies, to what degree, what parts of the environment it ignores, what variations and scales and times it applies over.

Is this understanding and example accurate for what you were trying to say, David?

## Probability & logic

Sorry this was hard going! I tried to make it as simple and clear as possible, but it was really meant to help people who have the wrong idea that probability theory extends logic.

Generally those people have a fairly solid mathematical background, and I hoped that the explanation here would be sufficient to de-confuse them. (It doesn’t seem to have worked, though. That might be because my explanation was unclear, or because their religious belief in

Bayes!!overrides their ability to understand.)Formally, yes; but in practice the “implicit generalization” abuse-of-notation gives probability theory the ability to make universal claims about properties of individual objects: P(red|bucket) = 1.0.

What it can’t do is express relationships among multiple objects, like “every vertebrate has exactly one father.”

For questions like this I don’t think

anysort of formal rationality is of any use. Definitely not either mathematical logic or probability theory, nor any combination.Yes. This starts to require real (meta-systematic) thinking, which is beyond what formal rationality (of any sort) is helpful with. “How To Think Real Good” waves in the direction of tools for that.

## This post is stuck in my head

(This was going to be an email until I realised you don’t publish an email address, so it’s a little unfocussed. Which is probably exactly why you don’t publish an email address, but well, I’ve written it now.)

I just wanted to say that I’ve been reading your writing over your various sites a lot lately and getting a lot out of it. I originally got here via Slate Star Codex and the people of rationalist-adjacent Tumblr, which I’ve been hanging around the edge of. This post in particular has got severely stuck in my head, and I keep rereading it.

I have to admit that I’m not convinced on the ‘civilisational collapse’ framing. For example, I’d definitely like more context for your claim that ‘major institutions seem increasingly willing to abandon systemic logic: rationality, rule of law, and procedural justice’ - are there any concrete examples you’re thinking of here? But I find the broad outline of the stages and the paths between them really inspiring, and I’m really looking forward to seeing where you go with it.

I’m particularly fixated on that figure you drew with ‘past, current, and potential future ways beyond stage 3’… it makes a lot of sense to me. Just as a bit of context, here’s a description of my own paths through that diagram.

My parents studied languages at university in the sixties, back before pomo got its claws into the curriculum, and ended up with something very like your ‘Stage 4 via humanities education’. I never got much of an arts education at all outside of music, but what I do have is mostly from reading their books, and so I think I have some understanding of what this is. The bit I picked up was heavy on analytic philosophy and the New Critics - up-to-the-minute stuff like Bertrand Russell, A.J. Ayer, I. A. Richards, William Empson, T.S. Eliot, L. Susan Stebbing. And I got a lot out of it - there’s a lot that’s plain wrong (logical positivism! the objective correlative!), but they all wrote so clearly that at least you can tell where they’re wrong. I’m still in love with their writing style. And I don’t know, I’m really grateful to the New Critics for giving me

someframework for enjoying literature, even if it’s a limited one and my tastes are still more shaped by it that they maybe should be.So that’s my experience of the top line of your diagram. For the bottom line, I

didget a really decent science education - my science and maths teachers were great at school, I did a maths degree and then a physics phd. I’ve definitely managed ‘Stage 4 via STEM education’. And then also as a student I read a shit-ton of pop science, Pinker and Dennett and Penrose so on, and drank in plenty of the New-Atheism-and-laughing-at-homeopathy atmosphere that the internet was filled with ten years ago.All this is kind of a long-winded way of saying that I really went to town on Stage 4. And was insufferable about it to exactly the level you’d expect - pomo was obvious nonsense, Sokal had shown them all up as charlatans, religion was a pointless source of woo, all the usual. It’s probably good that I didn’t get a modern arts education because I’d just have been obnoxious and argued all the time.

Obviously by now I’d like to move on. I guess I have been for at least the last five years or so, but not in a very organised way. I haven’t managed any full-blown nihilist STEM depression (don’t really have the temperament for it) but I did have a good line in aimless confusion for a while. I probably do have just enough of a background for the ‘genuine pomo critique’, and actually I’ve been vaguely intrigued by postmodernism and earlier continental philosophy for a while, but I never really know where to make inroads - Foucault sounds interesting and some of the suggestions above are excellent. Whitehead sounds like a particularly good path for me, but one I’d never thought of myself. And a native STEM bridge beyond stage 4 would be wonderful - I’m definitely up for tagging along on that project!

Anyway thanks very much for writing it all!

## A native STEM bridge beyond stage 4 (duplicate)

Hi, you posted this comment on two different pages (probably due to trouble with the spam filter—sorry about that!).

I’ve replied on the other one, here.

## Re: probability and logic

Here’s the formalization that you appear to be looking for:

Given a first-order language L and an underlying set X, fix a probability distribution mu over interpretations of the constants, functions, and relations of L in X. Now expand L to a language L’, a two-sorted language with sorts X and ℝ. L’ should consist of all the symbols of L (which apply to the sort X), the language of arithmetic (which apply to the sort ℝ), and an additional logical symbol P whose syntax is that if phi is a formula, then P(phi) is a term of sort ℝ. Then sentences of L’ can be assigned truth-values in the model (X, mu) in the obvious inductive manner.

As for whether “P(phi) = 0.4” is true if P(phi) is actually 0.400001, no, of course it’s false, because 0.4 and 0.400001 are different. This isn’t actually a problem because you can just use inequalities instead, like “0.39 < P(phi) < 0.41”.

That formalization appears to give you what you were asking for, but it is still no good, because you were asking for the wrong thing. You claimed that “P(boojum|snark) = 0.4” means the same thing as “∀x: P(boojum(x)|snark(x)) = 0.4”, but it does not. “P(boojum|snark) = 0.4” means that if you randomly select a snark, there is probability 0.4 that it is a boojum. This does not imply that P(boojum(Edward)|snark(Edward)) = 0.4. Maybe you have some prior reason to believe that Edward less likely than average to be a boojum, and P(boojum(Edward)|snark(Edward)) = 0.3 instead. Maybe you know for certain that Edward is not a snark, and then P(boojum(Edward)|snark(Edward)) is undefined.

## Not quite obvious

A lot of pretty smart people have worked on this for many decades, and so far not succeeded, as far as I know.

The details of your proposal are not obvious to me.

For starters, I don’t know what it would mean to assign a probability to a constant, function, or relation. What is the probability of 23, or +, or >? (Or, in the example domain, of Edward, or of “father”?)

## Re: Not quite obvious

Toy example: Let the language is the language of arithmetic together with a constant symbol c and a unary relation symbol R, and let ℕ be the underlying set. Let the symbols from the language of arithmetic (0, 1, +, and ×) be interpreted in the standard way with probability 1. Let c be interpreted as the number n with probability 2^(-n-1). And let R be interpreted such that R(n) holds with probability 3^(-n), with R(n) and R(m) being independent for distinct n and m, and R(n) is independent of c. Then P(R(c)) = sum over natural numbers n of 2^(-n-1)3^(-n) = sum … of (1/2)6^(-n) = (1/2)(1/(1-(1/6))) = 3/5.

Another possible probability distribution over interpretations of c and R is that c is interpreted as n with probability 2^(-n-1) and R is interpreted as “is prime” with probability 1/2 and as “is a multiple of c” with probability 1/2. Then P(R(4)) = (1/2)(probability 4 is prime) + (1/2)(probability that c is 1, 2, or 4) = (1/2)(1/4 + 1/8 + 1/32).

## A significant open problem

OK, I don’t think this is going to work. But, rather than arguing details, I’ll mention again that—as far as I know—this is a significant open problem in mathematical logic. If you have a solution, it’s definitely worth a PhD, probably a professorship, and possibly tenure. I encourage you to write it up in more detail, run in past experts in the field, and (if they are enthusiastic) submit it for publication.

## significant open problem?

What precisely is it that you’re claiming is a significant open problem? Combining probability measures and first order structures has been been done in several ways. For instance, Kiesler measures, which are probability measures over the Boolean algebra of definable sets of a first-order structure, are commonly used in model theory (technically, a Kiesler measure is only required to be finitely additive, but it doesn’t make a difference in countably saturated structures, and there’s no reason you couldn’t consider countably additive Keisler measures anyway). I think descriptive set theorists sometimes talk about probability measures on isomorphism-classes of countable models of first-order theories. You haven’t convinced me that you’ve identified something important that’s missing from already known concepts that combine logic and probability.

The suggestion you gave for a direction in which to try to combine logic and probability is misguided (as I pointed out in my first comment), enough so that I wouldn’t believe that capable researchers thought it worthwhile to explore that direction. And the fact that I quickly came up with a formal framework (which certainly isn’t worthy of a PhD thesis, or even a publication) that follows your suggestions also indicates it is unlikely that people were trying to do that and failing.

To be clear, it wouldn’t surprise me if smart researchers have thought of something unsatisfactory about the already known ways to combine logic and probability, which they have failed to resolve. But if so, I don’t know what these problems are, and your post didn’t adequately describe them.

## Open problem

A unified treatment of logic and probability theory that allows unrestricted (or at least fairly general) nesting of logical quantifiers and P().

This recent review paper lays out some of the issues. It’s not comprehensive—and, in my view, tends toward overoptimism—but it could be one starting point if you want to get into this further.

Yes; I know of several others, different from the ones you mention. None of them are very general, nor very useful (as far as I know).

I myself haven’t done anything! Nothing in this post was original. It’s all standard undergraduate-level mathematics.

Yes, my post isn’t about that. It only addresses the basic misconception that probability theory (by itself) extends logic. You are not part of its target audience; your understanding of the matter goes far beyond that elementary error.

## Well this is embarrassing; I

Well this is embarrassing; I just realized the formalism I suggested does not behave like what you were looking for after all. The sentences in the language I was calling L’ have probabilities rather than truth-values like I incorrectly said they did (not really a problem so far), but if phi is a sentence and r is a real number, then sentences of the form P(phi)=r always have probability either 1 or 0 in any model, whereas you suggested that you’d want it to be possible to express uncertainty about such claims.

http://intelligence.org/files/DefinabilityTruthDraft.pdf seems like the framework it uses is very similar to the problem you stated, but IIRC that paper involves various unsatisfying weirdnesses like relying heavily on nonstandard models. Perhaps you already knew about it; idk.

It seems I misinterpreted you as making stronger claims about what a unification of predicate logic and probability theory would look like than you actually were. I guess I shouldn’t have given you such a hard time over the difference between “P(boojum|snark)=0.4” and “∀x: P(boojum(x)|snark(x)) = 0.4” if you were simply illustrating one conceivable way that such a logic-probability unification could look, rather than claiming it would definitely look like that. You’re definitely right that probability theory does not itself extend predicate logic, though I think you overestimate how often people who say “probability extends logic” are confused about this rather than just using “logic” to refer to propositional logic.

Anyway, sorry if I was making an ass of myself.

## Thanks

I’m sorry, I should have explained why I was skeptical about your construction rather than just saying “I don’t think this is going to work.” I’m trying to get something written by Monday so I didn’t want to take the time.

Anyway, I appreciate your willingness to point out your own mistake when you’ve recognized it! Too few people will do that, and it definitely proves that you are not an ass.

Thanks for the pointer to the Christiano et al. paper! I took a cursory look at this when it first came out. My recollection is that it only deals with P(ϕ) where ϕ does not contain P(). I may remember wrongly. In any case, my understanding is that that the paper had some problems and was never published.

MIRI, the institute that the authors were affiliated with, has continued work along the same general lines. They hope it will cast some light on artificial intelligence, which I consider extremely unlikely, but their most recent draft is mildly interesting as mathematical logic. You might like to take a look.

## Jaynes

“Jaynes is just saying “I don’t understand this, so it must all be nonsense.””

Which is itself a form of mental projection fallacy.

## Proof of title?

Could you source the statement “probability theory is mathematically proven not to extend logic”? I’d like to see those mathematical proofs, or at least see abstracts about them, at the graduate math level or below.

It seems you have no issue with probability theory as a theory of belief, confidence, how they are attached to a claim -or a family of claims-, and how they interact with- and evolve with- new and prior information. Is that correct?

Also, I’m curious if you’ve finished reading ET Jaynes’ book.

## Probability

Did you read the article? It explains rigorously and and great detail.

This is undergraduate-level stuff. In fact, you don’t need to have taken even an undergraduate-level logic course to follow the explanation.

No. Probability theory is completely inadequate for any of those applications.

I didn’t read the whole thing. I’ve read much of it. It has major technical flaws, so there’s no point in being thorough. Also, if you know the relevant mathematics, it’s easy to see where he’s going if you skim.

## Confusion

So..I can define a “Probablity Monad” in some strongly typed programing language…and then use the Curry-Howard “isomorphism” to translate those types back into first-order logic.

Which seems to solve the issue to me… what does this not handle?

## Wrong-way reduction

This would be a wrong-way reduction. It demonstrates that everything probability theory can do, logic can do. (Which is true.)

You’d need to do the inverse: to encode every logical statement as a probability theory statement. That is not possible.

## You should just know, this

You should just know, this post sounds incredibly snobbish and dismissive of very respected and knowledgeable people about these subjects. As you have said, this is “undergraduate-level” math, I am an undergraduate (full disclosure!) but this post is dramatized and overwrought in a way that is an immediate red flag for any level of rigor or clarity. If the point you were making was really fundamental–and the tradeoffs, limitations, or unverified assumptions whose existence you are convinced of really were mathematically basic statements about the limits of probability–this wouldn’t have come out sounding like such a pompous screed. This post simultaneously exudes confidence, clarifies nothing, and bandies about understanding of the most basic mathematical concepts like its the ultimate proof of the author’s superiority. This is a manifesto devoid of results. If it does contain any, they are either too underdeveloped or faintly-sketched to register as anything at all. Please, you are making the lives of people who are actually seeking understanding harder with your arrogance. I’ve seen how you respond to other posts–I am not interested in your response to this. I have selected not to be notified of further comments on this forum. Best of luck with this whole massive, completely indecipherable enterprise this blog seems to represent

## RE: Wrong Way Reduction

Propositional logic embeds trivially in Probability theory, and the extension to Predicates is orthogonal.

Ontology (in this context, figureing out what space of ideas is both usable, and contains the helpful ideas) is in neither Logic or Probability Theory.

Do you have an example of a logical statement that can’t be embedded in Probability Theory?

## Ontology is logic with additional structure

In the computer science context, ontology is a model of selected or constructed entities that are useful for some given domain, and the logical relationships between them.

Any effective ontology has to

uselogic , but ontology adds the above elements. So it’s logic with additional structure that enables it to be applied to the real world.## The real challenge is how to combine deduction and induction

David,

It seems that we have

deductivelogic on the one hand (pure math such as 1st-order logic ) andinductivelogic on the other (applied math such as for instance probability theory).But something is missing from both….as David Deutsch emphasizes in his books (‘The Fabric Of Reality’ and ‘The Beginning of Infinity’), the art of generating good explanations remains a mystery.

Could it be that this missing art (which you call ‘meta-rationality’) is equivalent to

abduction, inference to the best explanations?And perhaps the way to obtain this missing art of abduction is to learn how to

combinededuction and induction into an integrated system.## Putting it together. Combine Induction&Deduction for Abduction!

Deepmind just posted a link to a new paper illustrating exactly what I’ve been suggesting. Link here:

https://arxiv.org/abs/1711.04574

In the paper above, the authors start with classical logic as applied in logic programming, then they allow for many-valued (non-classical, non-monotonic logic). They then attempt to mix this with neural networks!

So they’re starting to attempt exactly what I suggested above.

Deductive methods (logic programming), mixed with Inductive methods (non-monotonic many-valued logic and machine learning), yields the beginning of an entirely kind of method….Abduction!

## Every vertebrate has exactly one father

“Every vertebrate has exactly one father.” That example is explained in the page.

## Correct David, Deductive logic is more general than inductive!

I understand everything so much better now!

Inductive reasoning (of which ‘probability theory’ is actually only a part) is just a ‘fragment’ of deductive logic. That is to say, every aspect of inductive logic can be embedded in deductive, but not vice-versa.

In fact, probability theory is not even the most general form of inductive reasoning! I now think that the crown goes to a marriage of ‘type theory’ (a form of logic programming) with ‘non-monotonic logic’ (many-valued logic).

That is to say, I think type-theory/many-valued logic is the most general form of inductive logic! (probability theory is only a ‘fragment’ of this).

In turn, type-theory/many-valued logic is only a ‘fragment’ of deductive logic! This follows from the Curry-Howard correspondence!

https://en.wikipedia.org/wiki/Curry%E2%80%93Howard_correspondence

Note, every aspect of type-theory and many-valued logic can be embedded in category theory (pure deductive logic), but

notvice-versa!So inductive logic is a fragment (embedded in) deductive logic.

## Think of cognition as the filtered light of ultimate reality!

OK, imagine that ‘reality’ is ultimately just the categories of category theory! That is to say, imagine that mathematics exists ‘out there’ and it’s all just ‘categories’! Think of the categories of category theory as the ultimate reality and imagine them to be ‘sunlight’.

But the blinding light of the sun is too ‘pure’ to be sighted directly. It can never be captured by ‘thought’ in it’s entirety. The metaphor I’m making here is that reality can never be understood in its entirely (it’s beyond any formal system ), so we need to ‘filter it’. This I think, is the key insight of ‘meta-rationality’.

OK, now imagine ‘Inductive Logic’ as a pair of sun-glasses that filters the pure light of ‘Deductive Logic’ (Mathematical Categories). The filter blocks out some of reality, in effect, ‘structuring it’.

I’m suggesting that this filtering of reality is equivalent to ‘cognition’ (meta-rationality or intelligence or whatever you want to call it), which is ultimately

abduction, the art of generating good explanations.In my equation relating the 3 types of logic then, the ‘x’ sign here refers to the filtering operation.

Deduction x Induction = Abduction (Cognition)

The pure light of ultimate reality (the ‘categories’ of deduction) is structured or filtered by Induction , and the result is Abduction, or Cognition.

## “Gnaw and tug at the posts, and you will slowly loosen them up!”

If we take a Cartesian Closed Category, this is equivalent to Typed Lambda Calculus. So there’s our initial

deductivestructure.Now we’re going to perform a filtering operation, by transforming the above structure using

inductivestructure.To do this, deploy fuzzy logic! Instead of bivalent (two-value) truth-conditions, we use a continuum of truth conditions (infinite-valued logic).

The filtering will transform this into a model of a dynamical system by deploying Temporal Modal Logic. The result is an

ontology- anabductivestructure that is the first component of general intelligence!Lets use the same trick with another type of

deductivestructure – a hyper-graph.Now transform this using an

inductivestructure – in this case simply put a probability distribution on it to obtain random graphs.The filtering will transform this into a model of dynamical systems by deploying stochastic models. The result is a

network- anabductivestructure that is the second component of general intelligence!Finally, we’ll deploy the trick with a third type of

deductivestructure – a manifold.Take the manifold and transform this using the last type of

inductivestructure – aninformation geometry.The filtering will transform this into a model of dynamical systems by deploying data compression coding . The result is a

search algorithm- anabductivestructure that constitutes the third component of general intelligence!…

“One day you’ll break the fence that held your forebears captive!”

## Bayes is a domain of mathematics, not cognitive science

David,

Yes I think can now explain much more clearly what’s wrong with ‘Less Wrong’ style rationality. As you say, it’s just ludicrous to take Bayes theorem as some sort of ‘magic key’ that explains all rationality. But now I think I’ve pin-pointed exactly where the Bayesian cultists are going wrong, and I can explain it.

OK, imagine that you want to learn about psychology, and you go along to a lecture series run by a guy named ‘Sliezer Budkowsky’, who’s claiming to have the ‘magic key’ to all psychology. You sit through all the lectures and are amazed to hear that CHEMISTY is the key to all psychology!

Imagine that you’re interested in insights into personality traits such as ‘Open-ness to Experience’, ‘Introversion’, ‘Extroversion’ etc., you sit through the lecture series, but are astounded to hear that Budkowsky spends the entire lecture series on psychology discussing CHEMISTRY - chemical bonds, the periodic table, acids-bases etc., and then triumphantly concludes with ‘And that’s the key to all of psychology!’

In reality of course, you’ve actually learned nothing at all about psychology. The reason is that psychology operates on a

differentlevel of abstraction from chemistry, and for real explanations of psychology, you need concepts and models that are appropriate for the domain you want to learn about.Similarly, if I want to learn about machine learning, unsurprisingly I need to study well…machine learning. Bayes theorem won’t help much.

If I’m specifically learning about statistics and probability theory (branches of applied mathematics), then Bayesian models are appropriate. But if I switch domains, Bayes theorem quickly loses it’s relevance.

For one thing, in pure mathematics, the models are usually only appropriate for very simple (idealized) situations. For example, in probability and statistics university courses, they’re usually dealing with situations where they’re only looking at one variable (univariate statistics), and pure math can help. But of course, in the real world, you have multiple interacting variables (multivariate statistics), and an entirely different set of concepts and models would need to be deployed for that more general situation. And indeed, the numerical methods deployed in machine learning to deal with multivariate situations bear little resemblance to the stuff you learn in probability+stats courses.

In a nutshell, probability and statistics is about…well probability and statistics ;) There’s a set of concepts appropriate for a given level of abstraction. But change the level of abstraction you’re looking at, and these concepts no longer apply.

If I want to crack intelligence, I need methods of knowledge representation that can generate

newconcepts and models acrossmanylevels of abstraction. If we imagine the space of all knowledge domains, then probability and statistics is only a very limited sub-set of this. So it can’t possibly be the key to general intelligence.