You are reading a metablog post, dated August 9, 2013.

☞ The next metablog post is Going down on the phenomenon.

☜ The previous metablog post was Pop Bayesianism: cruder than I thought?.

This page’s topic is Rationalism.

General explanation: Meaningness is a hypertext book (in progress), plus a “metablog” that comments on it. The book begins with an appetizer. Alternatively, you might like to look at its table of contents, or some other starting points. Classification of pages by topics supplements the book and metablog structures. Terms with dotted underlining (example: meaningness) show a definition if you click on them. Pages marked with ⚒ are still under construction. Copyright ©2010–2019 David Chapman.

## Comments

## Talking past one another

So my problem with the substantial advice on thinking that you give in this post is that… I don’t disagree with it. Nor do I really think that it contradicts anything that has been said on LW. In fact, if it was somewhat polished, cut into a set of smaller posts and posted on LW, I expect that it might get quite upvoted.

One thing that has started to increasingly bother me in this debate is that you seem to be saying essentially “yes, Bayesianism is nice, but there’s a lot more to rationality too”. And - like I hinted at in my earlier comments, but could have emphasized more - I’m not sure if you would find anyone on LW who would disagree! Eliezer spends a bunch of posts talking about how wonderful Bayesianism is, true… but while he certainly does make clear his position that Bayesianism is

necessary, I challenge you to find any where he would claim that it issufficient.And ultimately there isn’t

thatmuch content on LW that would talk about Bayesianism as such - I would in fact claim that there are more posts on LW that are focused on providing “heuristics for informal reasoning”, like in this post, than there are posts talking about Bayesianism.Given that, I find that this post is somewhat talking past the people who you are responding to. As I see it, the argument went something like this:

This seems to me similar to this (hypothetical) debate:

## Stuff

Yeah, I feel the same as Kaj. The example I was going to use was mathematics as a whole: “People say math is important. And one example given of math being important is that you can use it in baking, for example adding half a pound of sugar to another half pound of sugar. But this is only a tiny part of the problem. First you have to obtain the sugar, which may require growing sugarcane or locating a supermarket. Then you have to measure it. And sugar isn’t even an ontologically fundamental concept - defining ‘sugar’ is prior to this whole enterprise! So overall I think math is only a tiny part of baking, which means mathematicians are silly to ascribe so much importance to it, which means anyone who tries to teach math is part of a cult.”

As for your own epistemology, it reminds me a lot of virtue ethics, so much so that I’m tempted to call it “virtue epistemology”.

My critique of virtue ethics (see the part in italics after the edit here) seems like it could fit here as well (oh, man, I actually used that same grammar metaphor - apparently I am nothing if not predictable). Yes, the natural human epistemological faculty is extremely complicated and very good at what it does and any mathematization will be leaving a lot out. On the other hand, I think it is good to try to abstract out certain parts and fit them to mathematical formalism, both for the reasons I describe here and because a very inadequate partial understanding of some things is the first step towards a barely adequate understanding of more things.

Since I’ve already stretched that poor grammar metaphor to its limit I’ll continue Kaj’s discussion of physics. Imagine if someone had reminded Archimedes that human mental simulation of physics is actually really really good, and that you could eyeball where a projectile would fall much more quickly (and accurately!) than Archimedes could calculate it. Therefore, instead of trying to formalize physics, we should create a “virtue physics” where we try to train people’s minds to better use their natural physics simulating abilities.

But in fact there are useful roles both for virtue physics and mathematical physics. As mathematical physics advances, it can gradually take on more of the domains filled by virtue physics (the piloting of airplanes seems like one area where this might have actually happened, in a sense, and medicine is in the middle of the process now).

So I totally support the existence of virtue epistemology but think that figuring out how to gradually replace it with something more mathematical (without going overboard and claiming we’ve already completely learned how to do that) is a potentially useful enterprise.

## Confusion about differences

I would like to start by pointing out that it’s really hard to understand claims that someone else makes about epistemology. We like to understand what someone else says through our own lense of how the world works. People don’t change their own epistemology fundamentally after reading a single blog post that illustrates a weakness in their way of viewing the world.

Most people take months for a process like that instead of a few hours.

In my perspective there seems to be a clear disagreement.

Elizeer says that Bayesianism is necessary while Chapman says it isn’t.

Chapman seems to argue that it’s often useful to reject the use of Bayes formula to reduce the complexity of a model of a problem.

According to Chapman that’s even true for some AI problems that are completely formalized in math.

Chapman described that he sometimes uses antropology. If I looks through LessWrong relationship to antropology I find post titles like “Anthropologists and “science”: dark side epistemology?” by Anna Salamon.

There are problems that are more likely to be solved by learning to be a better anthropologist than by learning to be a better bayesian.

On another note the concern that it problematic to boil down all concern about uncertainity to a single number is also absent from less wrong.

Nassim Taleb writes a lot about how people get irrational by trying to treat all uncertainity as a matter of probability.

Scott writes in his latest post “An Xist [Bayenian] says things like “Given my current level of knowledge, I think it’s 60% likely that God doesn’t exist.” If they encounter evidence for or against the existence of God, they might change that number to 50% or 70%. instead of ” “You can never really be an atheist, because you can’t prove there’s no God. If you were really honest you’d call yourself an agnostic.”“.

In a recent facebook post Taleb wrote: “Atheists are just modern versions of religious fundamentalists: they both take religion too literally.” By which he means that atheist make an error because they think that religion is really about God.

If you start by thinking in probability of whether God exists you still think it’s about God and block yourself from framing the issue differently and perhaps getting a different understanding of it.

That also has interesting implications for whether LessWrongism/Bayesianism is a religion ;)

## One important distinction

I generally agree with what Kaj Sotala says. Bayes isn’t the whole picture, there’s obviously lots of other important parts to thinking, particularly things like forming the correct categories, asking useful questions, and deciding what to do with your answers. Obviously individual cases of thinking, in particular solving toy problems, may have the solutions to some or all of these steps handed to you on a platter, so only some subset are required.

I mentally picture thinking as a serial process steps like noticing the presence of an interesting problem at the start, steps like formulating a good vocabulary further in, updating on evidence still further in, and stuff like deciding what to actually do near the end. Trying to use Bayes theorem for any step other than updating_on_evidence makes about as much sense modifying your car to use its engine as a wheel, but concluding that Bayes is therefore bad makes about as much sense as throwing out the engine because it isn’t circular enough.

Where I think we diverge, if we diverge, is that Bayesian thinking (preferably applying the theorem, otherwise using it as an approximation or a constraint) is the right way to do an important part of the thinking process. If you do something else instead, then you will get worse results.

There is, however, an important distinction to make between Bayes, and the content of this post. Bayes is mathematised. It is both precise and provably optimal for its task. What you give here are heuristics, they are probably better than what people do naturally, but not precise enough for the question “are they optimal” to even be meaningful. This is not a criticism, you never claimed they were anything else, but it is a distinction.

I do not claim to have a mathematically precise and optimal method most stages of the thinking process, but where we might diverge is that I (and I think most LessWrongers) would like to have one and think there is reasonable hope of getting one, whereas many people (including you?) either don’t think such a technique can exist, or wouldn’t want it to.

This is what makes Bayes (the theorem combined with the idea of beliefs as probabilities) so important. It serves both as an example of what we want and as evidence that this might be a sane thing to look for, if we did it once we’re more likely to be able to do it again (look! Bayes! … sorry).

In addition, I suspect that because the updating_on_uncertain_evidence part of the problem is solved it looks easy, at least to smart people like you. The finding_a_good_way_of_talking_about_the _problem part hasn’t been solved, so it looks hard. This makes Bayes look less important than it is, because it only solves the ‘easy’ bit of the problem. The mistake in this thinking should be obvious.

I also don’t think I’m taking an atypical view for LessWrong anywhere here, since I’m pretty much just rephrasing some posts from the sequences (which sadly I can’t find). If you think that the above needs to be emphasised more, then I fully agree, and will add it to my long list of ‘things that aren’t very good about the actual day-to-day practice on the LessWrong message board’.

## The Best Introduction

First off, I want to say thanks for posting this as is, rather than spending more time editing it. I don’t think your worries about not spending enough time writing it were founded; I think it was clear enough to get your point across and I enjoyed the flow.

I agree with Kaj that I think what you’ve written so far would fit well into the LW consensus. I’ll also add on the specific encouragement to post it on LW. I came to LW, was excited, but then looked around and asked myself “where’s the decision analysis?” It’s an existing academic and professional field devoted to correctly making decisions under uncertainty, which seems like it should be a core component of LW. I was familiar with it from my academic work, and realized that the easiest way to get it onto LW was to put it there myself. So I wrote a sequence on the mechanics of decision analysis, and reviews of important books related to DA, and they’ve all been positively received. I’m also exposed to much more in the way of potential collaborators and other interesting ideas from being on LW regularly.

In particular, I want to comment on:

I’d modify this slightly; the self-help version is

improving your informal reasoning is probably more important than improving your technical methods. In particular, the relevant LW post is The 5-Second Level, which argues that in order to actually change rationality in the field, you need perceptual triggers and procedures, rather than abstract understanding that is not obviously useful and thus never gets used. This quote in particular might interest you:And now I’ll finally get around to the subject of this comment: I agree with you that Bayes is probably not the best introduction to LW-style rationality, because it’s only part, and not necessarily the core part. I’m not sure what is the best introduction, though. A common one seems to be heuristics and biases- but that’s not that great either, because the goal is not “find out how you’re dumb” but “find out how to be as smart as possible.” Finding errors to fix is important but insufficient.

What I think is best currently is just introducing the idea of ameliorative psychology (I was introduced to the term by Epistemology and the Psychology of Human Judgment, which I had read before finding LW). Basically, the premise seems to be “thinking real good is a skill that can be studied, researched, and developed.” If you could tell people in 15 minutes the fundamental secret that would make them awesome, there would be many more awesome people running around- but I think that in 15 minutes you can tell people “hey, decision-making is potentially your most important skill, and you might want to actively develop it.”

## Agreements and disagreements

Thank you all for the thoughtful comments!

Kaj— Yes, if we’re in violent agreement, that would be a happy outcome! And, indeed, there is much we agree on. There was much we agreed on to start with (e.g. formal rationality is important, and everyone should understand probability theory). We agree on more now that I understand “informal Bayesianism” better, thanks to you andVaniver!Like

Christian, I do think there is still substantial disagreement. Partly it’s just a matter of emphasis. [Portentious radio announcer voice:] “Bayesianism: 1% or 99% of rationality? OrSOMEWHERE IN-BETWEEN!”But I do think we’d disagree about some specific propositions as well. Whether and how to pursue clarity about that, I don’t know. I suspect we are

stilltalking past each other, to a significant extent, and if it appears that there’s only a difference of emphasis, it’s because I haven’t gotten around to actually stating my position. (On the theory that “how to think” was more interesting and useful and less obnoxious.)Scott— The analogy with virtue ethics is interesting, and I think discussion of it may help clarify our approaches. You will recall that I loved your original critique of McIntyre’s virtue ethics, and we had an interesting discussion of it here.We agreed that virtue ethics has “no there, there.” Now, if “virtue, deontology, consequentialism” are exhaustive alternatives, that prompts a Bayesian update in the direction of the other two leading brands. But my view (expressed in that comment thread) was “none of the above.” Not in favor of a fourth, but in favor of “we know none of those work, and we have to do the best we can anyway.” Taking that seriously leads in entirely new directions, instead of rehashing the standard arguments which don’t go anywhere because none of the Big Three can work.

(By the way, I found your “Whose utilitarianism?” post, written about the same time, very interesting because it seemed like maybe you were open to that possibility.)

What’s attractive about deontology and utilitarianism, and unattractive about virtue ethics, is that the first two appear to be

complete systems, so that if only one could get the details right, you wouldn’t need to think about ethics. You could just consult the system and be done. And if one could be made to work, that would be great! Thinking is mostly an unpleasant time waster. But, my view is that they are unfixable, and the possibility of a systematic ethics is a chimerical fantasy. It’s actively harmful because it pointsaway from thinking about ethicsinto thinking about technical, meta-ethical problems instead.So, if what I wrote in “how to think” looked like virtue ethics, it’s probably only because it’s non-systematic. It doesn’t hold out the possibility of any tidy answer.

I would

loveto have a tidy system for how to think; that would be hugely valuable. But I believe strongly that there isn’t one. Pursuing the fantasy that maybe there could be one is actively harmful, because it leads away from the project of finding useful, untidy heuristics.One might continue to pursue utilitarianism, or a general theory of rationality, on the basis that there is good evidence that such an answer exists. But I see only evidence against that.

So, when I see people pursuing them anyway, my gut feeling is that it is “religiously” motivated: based on faith despite evidence, as a way of avoiding recognizing unpleasant truths. The unpleasant truth is that the fundamental problems of human existence are inherently messy and ambiguous—nebulous—and there are no systematic answers. “There

mustbe a way to fix utilitarianism, because otherwisethe universe would suck.” “Bayesmustbe The Answer To Uncertainty, because otherwisewe’d be helpless animals, and that wouldn’t be fair.”Obviously, no one says such things… and maybe that’s not what’s going on at all… but it may be worth considering with an open mind.

Jumping to a meta level: This is the point at which the topic of my post coincides with the topic of Meaningness. This site is about what happens when you face up to nebulosity, and abandon hope of finding Eternal Truths.

Christian— Thank you for your comment! I think you have understood where I’m coming from.Ben— Thanks for your comment also! I hope I’ve addressed the points you made in my replies to Kaj and Scott above.One clarification: I do think there are times when Bayes is the best tool for the job. However, I don’t think it is “correct” with respect to reality. It’s obviously correct as mathematics; but there’s always a gap between reality and math. (So I don’t think there is another alternative that

iscorrect with respect to reality.)Something I’ve thought about writing is a list of the preconditions that have to be in place for Bayes to be the right tool for a job. It’s a long list. I think you would probably agree that each is required. Some are rarely satisfied. That might make it clear why I don’t think Bayes is very exciting. (Formal Bayes, not the “informal Bayes” that I’m now sold on.)

Vaniver— Thanks for the encouragement to post on LW! I’m taking that seriously as a possibility now.Thanks also for the rest of what you wrote. I hadn’t come across “ameliorative psychology” before. I have the pages you referenced open in browser tabs, and may have more to say after I’ve read them.

## @David

@David

I was tempted to just write “what everyone else said”, but instead of joining the agreement circlejerk, I’ma instead complain that even though I get what you mean, and I like the references, I still want moar.

It’s a flaw of the existent sequences that they already assume more-or-less complete formulations of what the question is, and only care much about picking the right choice once you already have some idea what is sensible and relevant to the question. (The only real counter-example I can think of off the top of my head is Harry in HPMOR’s chapter 16 enumerating as many ways as possible to use the objects in a room for combat purposes.) I don’t think most people on LW would substantially disagree with any of these points, or that this flaw exists, but still, so far no one really complained much about it or attempted to fix it.

So I’d still like to see worked examples of problem formulation, if only because I’m a selfish jerk who wants to get as much practice out of you as possible. Thus, I’m totally unconvinced and disagree with everything, you’ll have to convince me more, with concrete examples and more anecdotes, and last time I counted the number of your long-term projects is still single-digit, so why aren’t you writing more books already?

(This is of course the same tactic as publicly declaring that “X SUCKS AND X-ISTS ARE STUPID, X DOESN’T EVEN Y” when you’re too lazy to figure out how to do Y yourself, ‘cause it will trigger many an X-ist to write such excruciatingly detailed explanations for you.)

## Aye, there's the rub!

Yeah, that…

## "Catherine G. Evans"

It’s Cath

arine, peasant.## With an a

Very sorry; fixed.

## Virtue Epistemology Is A Thing

You, the generic reader, can read more about standard virtue epistemology in encyclopedias :-)

As a sometime student in this area who is aware that her idiosyncratic fandom puts her in the minority, I think it might be helpful to explain why I bother with it. In a nutshell, I think human beings can be understood by assuming they use a lot of reputational reasoning, and I think reputational reasoning is understudied relative to its importance, possibly because the area is both controversial and large, and hence hard to make progress in.

The more totalizing meta-ethical approaches both seem to help structure existing human debate practices that are relatively formal, like “planning stuff out as a group in order to achieve the goals of the group” (consequentialism) and “writing and interpreting laws for settling disputes” (deontology), and then for purposes of philosophy they turn the idealism up to 11 by “aiming for utopia” and trying to figure out “God’s laws”. Then the idealized reasoning can be backported into more pragmatic areas and used extensively in the day to day application of power.

My interest in virtue ethics stems from the observation that a huge amount of how actual people seem to actually behave in non-governmental contexts (eg as individuals and in small groups) is significantly accounted for by highly social reasoning (at least partially implemented by instincts) that builds off of an expectation that teams win against individuals and that the fundamental attribution error is a shortcut that many of the agents in the system will use and expect others to use. Theories of attribution, imputation, and reputation become relevant, and it seems that human beings do a lot of tacit reasoning about the extended consequences of these processes. From within discourse processes you can see evidence for this, as with exhortations to think in terms of goals rather than roles. Advice to think in terms of roles rather than goals is less common because people are often already often doing that.

Or another example: the old forms of rhetoric included not just logos (appeals to logic) and pathos (appeals to passion) but also ethos (appeals from the character of the person speaking)… however, children don’t have very much distinctive character to practice and older people are usually less receptive to teaching. If you put ethos in your text in modern times, even justifiably, (rather than leaving it safely in your subtext) you’ll often be accused of a “logical” fallacy (authority or ad hom are the usual ones) especially if your interlocutor is arguing in bad faith… which is of course a conclusion people leap to so often with such bad consequences that some communities have formal advice to do the opposite . The whole area is ripe for trolling because it is emotionally and intellectually complicated.

If there is an obviously singularly awesome way to explicitly think about reputation in general and epistemic reputation in particular, it has not been formalized yet, to my knowledge.

Thus, if you want to understand and engage with most actual people

according to an explicit and helpful theoryrather than by feel, descriptive empirical virtue ethics research is probably indicated. Scott thinks virtue ethics isn’t that useful but I think that’s because he wants his theory to tell him which policies to insightfully advocate more than he is trying to formally Solve Ethics before the singularity arrives, while (when I’m in a rare crazy heroine mood) the latter seems more urgently important to me than the former.A barrier to research (and potential object of study in itself) in virtue ethics in general and epistemic virtue ethics in particular is that there are a lot of conflations and confusions that happen in the tacit reasoning people actually engage in and in various theoretical accounts of it. If someone had a good theory it would sort of be about who is the most authoritative (and thus who should be deferred to when a specific debate that falls within their jurisdiction?). It would also sort of about who is best at discovery (and thus who deserves research grants?). It would also sort of about pedagogic ideals (and thus curriculum design?). It would also sort of be about “fitness to rule”, which is an independent reason to predict that the subject will make people crazy :-(

For what it is worth, I thought David’s original essay was full of interesting thinking on the operationalized pedagogic side of the issue of “how to think good” that I’m grateful for having been collected together (I’ve bookmarked a few things for later study). And I am sympathetic to the difficulty of talking about the general subject area (which sort of inherently involves various status moves) in public forums where all kinds of signalling/marketing/status stuff that relates to real world resource allocation is happening simultaneously. It seems possible that even if everyone party to the discussion was a Bodhisattva and knew that the others were as well, there might still be somewhat complicated interactions because of predictable pragmatic consequences of not doing at least a bit of zigging and zagging. Consider how weird dharma combat looks.

Despite the various constraints people are working within, I also suspect the ongoing back and forth process (including the previous posts on various blogs) has been pedagogically educational for a lot of lurkers and is praiseworthy on that basis alone :-)

## It's a thing!

Hi Jennifer,

Thank you for a very interesting comment!

I had no idea virtue epistemology was a thing. I’ve started reading the SEP article and it looks highly relevant so far. (It’s rather long… I may follow up here after I’ve finished it.)

What you said about virtue ethics made me think there’s more to it than I had supposed. Mental note to investigate further.

I took a quick look at your blog, and found a discussion of Dreyfus’ paper on Heideggerian AI. I’m not sure how you found my site, or if you know that “Heideggerian AI” consisted mostly of me and my collaborator Phil Agre. (I wrote about that here.) An interesting coincidence if you didn’t know.

I hope that you are right that this discussion has been useful for lurkers! It’s been useful for me, anyhow.

## Heidegger, Buddhism, flow

Oh, also, since you mentioned dharma here, and on your blog the connections between Heidegger and flow, you might find interesting my article on connections between Buddhist tantra and flow, which was also influenced by my reading of Heidegger.

## Model comparison

You complain that probability theory “collapses together many different sources of uncertainty”. It doesn’t. People do that. Probability theory is perfectly capable of separating all of these, if programmed by a willing and well-educated user.

You say that you solved your brick-stacking problem using no probability theory, only logic, but have you noted that binary logic is a limiting case of probability theory? Further, have you noted that zero uncertainty on any issue (needed for Aristotelian logic to work) is typically only an approximate model for how well we really know the world?

You seem to characterize the typical Bayesian as somebody who believes that all inference must explicitly involve Bayes’ theorem, but I think you’ll struggle to actually find many people who hold that view. What I believe (along with many others) is that the success of any procedure of inference is derived from its capacity to mimic the outcomes of probability theory. Furthermore, we can often gain insight into the value of some heuristic procedure by comparing its structure to that of probability theory.

I applaud your efforts to develop efficient heuristics for problem solving. Problem formulation is certainly a fascinating and highly important topic, and simply being Bayesian doesn’t guarantee an effective toolkit for this. Your analysis of the relationship between probability theory and problem formulation is flawed, though.

You say:

“I’ll talk in general about problem formulation, because that’s an aspect of epistemology and rationality that I find particularly important, and that the Bayesian framework entirely ignores.”

In a comment on another thread you directed me here, saying: “Choosing a good hypothesis space is typically most of the work of using Bayes’ Theorem.” Certainly, without a suitable hypothesis space, we are stuck. To test whether a set of hypotheses is appropriate for solving some problem, we will (usually informally) look at the various P(D|H) - if they are all extremely low, then we should try again. But is there a way to formalize this procedure? Some methodology that our informal methods seek to approximate? By golly, I believe there is. It’s called Bayes’ theorem. See, for example, my articles on model comparison and model checking.

There is actually an active sub-discipline, within ‘Bayesian’ research, attempting to develop efficient techniques for hypothesis formulation. You should find out what is being done within a framework, before declaring what it ignores.

## Following up

Thanks Tom, these are interesting points. They deserve a longish reply. I’m at the Buddhist Geeks Conference this weekend, and early next week will be busy, so I may not get a chance until late next week—sorry!

In the mean time, re “an active sub-discipline, within ‘Bayesian’ research, attempting to develop efficient techniques for hypothesis formulation”—Could you point me at this? Some of the major papers/researchers or an overview/introduction? Thanks!

## Categorization is the true 'Beginning of Infinity'

Hey Tom, when you have a hammer, everything looks like a nail. I wish Baysesians good luck trying to use Bayes theorem to formalize a procedure for testing the hypotheses space (‘model comparison’ and ‘model checking’). They won’t succeed, because the process of problem formulation can never be reduced to probabilistic reasoning.

The main usage of probabilities to date has been prediction, but as David Deutsch convincingly argues in his two superb books ‘The Fabric of Reality’ and ‘The Beginning of Infinity’, science is not primarily about prediction, it is about

explanation.Yes, binary Aristotelian logic is indeed just a special case of Bayesian (probabilistic logic), but this sort of thing should actually be highly alarming to Bayesians, because it demonstrates that a framework that every one thought had ‘solved’ logic for centuries, was in fact, just a special case of something far more powerful, namely probabilistic (Bayesian) logic. What makes you think that Bayesianism won’t suffer the same fate?

What framework could be more general than probability theory you cry! I can suggest an answer: Categorization/Analogical Inference.

Picture a mind as a space, and ‘the laws of mind’ are analogous to the

principles of cognitive science.

Now in this ‘mind space’ picture the ‘mind objects’ - I suggest these

are logical predicates - symbolic representations of real objects. How

do these ‘mind objects’ interact? I suggest picturing ‘mind forces’

as analogous to the ‘strengths of relationships’ between the mind

objects (predicates or variables) so ‘mind forces’ are probability

distributions. But what about the background geometry of mind space?

I suggest picturing ‘curvatures’ in the geometry of mind space as

analogous to concepts (categories or analogies).

Then Symbolic logic is the laws governing the mind objects (rules for

manipulating predicates). Bayes (Probability Theory) is the laws

govering the mind forces (rules about probability distributions), and

Analogical inference (categorizaton) is the laws governing the

geometry of mind space itself (concept learning and manipulation).

Bayes is but a stepping stone; Categorization is the real ‘Beginning Of Infinity’.

## Single-number probabilities, etc

David, thanks for this post. The “brain dump” worked out well, and I think the comments show that thoughts were nicely provoked.

Christian notes “the concern that it problematic to boil down all concern about uncertainty to a single number is also absent from less wrong.” I want to add a few more reasons to limit the application of Bayesian calculations. In the terms of Scott’s blog-post, I advocate keeping one foot in Aristotelian certainty, and one foot in “Wilsonian” relativism, while keeping a solid footing in probabilistic beliefs. (I’m a quadruped, or maybe a millipede.)

On the Aristotelian side, there’s plenty of stuff we just flat-out believe, and that’s as it should be. There’s no need to put a number on it, nor does that mean that the belief isn’t revisable. You just revise it when the need arises, without specifying in advance how much evidence would call for such revision.

On the relativistic side, sometimes just saying “I don’t know” is better than saying “50-50 chance” or any other number. This isn’t

justto say that pulling numbers from one’s nether regions is too much bother with not enough payoff. After all, Bayesianism and (my real target) various utility theories are supposed to be normative, not prescriptive. So it’s no mark against them to admit that putting numbers on beliefs and crunching them in the Formula is seldom necessary. Rather, the point is that even if you had all the time in the world, or just a burning epistemological curiosity, sometimes there is still no point in putting odds on a proposition. We don’t always have to bet. We don’t always have a uniquely appropriate reference class with known statistics. There just isn’t any sufficiently general reason to generate probabilities, that I am aware of. And I’m willing to bet :) that the reason for the unawareness is the nonexistence of that sufficiently general reason.## Physics

Marc,

when the methods of science work, it is because they successfully mimic (and sometimes even directly employ) probability theory. For anybody to say that “the main usage of probabilities to date has been prediction” is grossly over simplistic. See my posts Total Bayesianism and Natural Selection By Proxy as simple explanatory examples.

There may be an effective theory of inference that is more general than probability theory, but from your description, it won’t be categorization inference. You claim this theory as some sort of framework modeling mind spaces and mind objects. That’s fine, if it really works, but what I’m concerned with is much more general than that - real spaces, and real objects, of which mind spaces and mind objects are quite a small sub-population. Since probability theory handles this wider scope quite well, I dare say whatever your theory of minds can get right, probability theory can replicate. These are physical entities we are talking about, if you want to infer their mechanics, observations and probability theory will do just fine.

## Bayesian hypothesis space choice?

Tom— First, a general response to your approach, which seems to be “Bayes’ Theorem can do everything.” (Is that an unfair summary? Your “Total Bayesianism” post seems to say pretty much that.)Let me suggest another counterexample. (It may be easier for you to understand, as a physicist, than the logic-based planning one.) You are designing an airplane wing. You need to be confident that frictional heating is not going to exceed the rated maximum for the material. Both heating and heat dissipation depend on the exact shape of the wing.

This is a situation of uncertainty. Is Bayes the best tool for the job? I’d suggest using fluid dynamics and the heat equation instead. (Both are systems of differential equations that can be solved numerically.) Would you argue that Bayes subsumes differential equations?

I have two responses to this. First, this appears to be a “no true Bayesian” (i.e., “no true Scotsman”) argument. People do routinely model multiple co-occurring forms of uncertainty with a single number. I suspect also that the set of types of uncertainty is undefined. As far as I know, there is no checklist that Bayesians regularly go through to separate out different types of uncertainty and model them separately.

Second, I suggest that many of these types of uncertainty are not typically best modeled probabilistically. Different approaches are better used in different cases. In general, I suggest that

figuring out what is actually going onis usually a better approach than probability, which is typically best left as a last resort. For example, in cases of measurement error, the best thing is to reduce the error by improving your measurement technique. Failing that, you are better off figuring out what is causing the error and building an explicit model, in which case you may be able to correct it systematically. Only when that fails does it make sense to apply some sort of general probabilistic slop factor.You are referring to Cox’s Theorem, I assume.

This is widely misinterpreted in pop Bayesianism as “Bayesianism subsumes logic; it’s the only correct generalization of logic when there is uncertainty.” This is wrong in multiple ways. I’ll point out just one: probability theory generalizes only

propositionallogic (AND, OR, NOT).Propositional logic is to logic as Roanoke Island is to America. It’s the starting point, but it’s a tiny part of the whole; and, in practice, pretty much useless by itself.

Real logic begins with predicate calculus, which is

massivelymore powerful and complicated. Predicate calculus trivially subsumes probability theory. The converse is not true. (I’ve done a little googling, and it appears that there are Bayesian extensions of predicate calculus. Whether those have any value, I don’t know. They don’t appear to be widely used.)The classical planning solution used several levels of logic beyond basic predicate calculus.

Bayes’ Rule has absolutely no applicability to this problem. (That’s a challenge…)

Yes, obviously. Recall that in my earlier post, I strongly advocated everyone learning probability theory in high school. I am not arguing against probability theory. I’m arguing that it is a tool, with limited applicability, and not The Answer To Life, The Universe, And Everything.

Again, I’d be interested in pointers to this research.

Or, do you mean Bayesian model comparison itself? That doesn’t address the issue, at all!

The issue is that the space of possible hypotheses is effectively infinite. (See the discussion in the post, starting with grains of sand.) Model comparison is computationally expensive; you cannot compare very large numbers of alternative models. Virtually all possible models need to be eliminated (somehow!) before you could apply it.

## Vaguely uncertain, and utilitarianism

Paul— Interesting comments! I think we have similar perspectives.People usually

talk as iftheir state of belief was “pretty sure” or “could be, I guess” or “I haven’t a clue.” Maybe those represent unconscious numerical probabilities; but the burden of argument for that hypothesis is on people who believe so. (I don’t know of any evidence for it.) Maybe we don’t unconsciously use numerical probabilities, butought to. Again, I don’t know of an argument for this beyond “we can’t think of a better idea.”I think that the “I haven’t a clue” case is important. As you say: “sometimes ‘I don’t know’ is

betterthan ‘50-50 chance’ or any other number. “How much confidence you have in your degree of confidence—often that’s important! If you

know for suresomething’s a 55-45 bet, it can be rational to leverage up and bet big. If yourbest guessis 55-45 but basically you are pulling a number out of your nether regions, you’d do better to walk away.There’s various extensions to probability theory that try to deal with this, but they’re complicated, and it becomes even more obvious that you are pulling details out of nether regions when you are talking about the shape of a probability distribution function or something.

I suspect, instead, that our levels of confidence are inherently vague, and that they

shouldandmustbe inherently vague. Quite what that implies, I’m not sure!I too regard criticism of Bayesianism-as-religion as a preliminary step to criticizing utilitarianism.

## real-world v's abstract reasoning

David,

Thanks for taking the time to reply in detail. You ask if I think that differential calculus is part of probability theory: No, it’s not. In fact differential and integral calculus are indispensable for deriving much of probability theory.

Yes, we can do some sort of inference with differential equations, but that inference is strictly limited to an abstract domain – the entities we learn about do not actually exist. When I say that probability is the ultimate model for solving problems, I really mean problems of inference (

includingproblems of decision – inferring, ‘what is the correct thing to do here’) about the real world. You can’t solve such problems without using something that at least approximates probability theory.You promise a counter example involving airoplane design, but you fail to deliver. No, I’m not saying that fluid mechanics and what-not are the wrong tools to employ, but these are models of reality. Inferring appropriate models and assigning values to the coefficients in such models are issues of judgment under uncertainty. Furthermore, the expected cost of getting the model or its parameters wrong is not known precisely. If we are strictly honest with ourselves, we will acknowledge the very small but non-zero probability associated with the divine-intervention theory of flight, and we will recognize that our knowledge of the physical parameters of any suitable model is best described by a probability distribution. We don’t need to explicitly use Bayes’ theorem in this case, but that is only because the probability distributions are sufficiently narrow (actually, would be sufficiently narrow, if we could be bothered with the analysis – probability distributions do not exist on their own). Thus, BT is not the ‘best tool for the job’ in this case.

Thus, when we learn something about the real world using pure mathematics, it is probabilistic learning – e.g. if I build an airoplane using this procedure, it’ll probably fly. We need probability, or some workable approximation, to bridge the gap between abstract reasoning and understanding of the real world.

Incidentally, if you wonder how we could ever have inferred the principles of abstract reasoning (needed to derive probability theory), I would suggest that our ability to effectively do so comes from using procedures that can be modeled as approximately probabilistic. Probability theory works because it works. Human brains work because they approximate probability theory. Abstract reasoning works because it is derived by human brains, figuring things out under uncertain empirical input.

You say that predicate calculus contains probability theory. Granted. But if you know a way to actually learn something about the real world using elements of predicate calculus that are not probability theory in disguise, then please show it to us. There are some simple statements that seem to be demonstrable in this way, such as the cogito, but if you make use of logic, how do you know that your formalism will deliver the truth?

I’m afraid all the references I have at the moment on problem formulation relate to model comparison, which I see as a major component, though you evidently don’t. I remember reading something that tried (not massively successfully, as I vaguely recall) to go deeper, in terms or how to actually originate hypotheses efficiently, but I can’t find it now, sorry. But let’s consider a problem like inferring the probable cause of global warming. It is inefficient for us to include in our hypothesis space something like “global warming is caused by this grain of sand.” Why am I confident in saying that? Because nobody has ever observed a correlation between the identified sand grain and global warming, or even any especially analogous correlation. (I might be wrong to eliminate this hypothesis, but the expectation is that I am right – hence the efficiency of not including the idea.) How do we arrive at a principle like that? It is something overwhelmingly reinforced by our experiences from the moment we get pushed out of somebody’s uterus. What is the model for learning from experience? Probability theory.

One last thing I’d like to emphasize: when I say that model comparison is crucial, I don’t mean that we must laboriously go through Bayes’ theorem each time we want to make progress. Valuable heuristics exist, and probably even more potent heuristics await discovery, but again, their success is, or will be, derived from their ability to mimic probability theory. We can use this fact to guide our search for these heuristics, to prove their value, and to establish their limited domains of applicability.

## Probability and uncertainty are not the same thing

Tom—This is good: apparently we have important common ground. We agree that formal models can only ever approximate reality, and always leave some uncertainty due to imperfect abstraction.

The remaining disagreement is whether this uncertainty is captured by probability theory. If I understand you correctly, you believe that probability theory is an exception, and it bridges the gap between formal models and reality. My belief is that it is a formal model like any other. Like any other, it can only approximate reality. [Except at the quantum level, but I assume that’s not part of your argument.]

There’s a nice analogy with the quote “I contend we are both atheists, I just believe in one fewer god than you do. When you understand why you dismiss all the other possible gods, you will understand why I dismiss yours.” (Except that you

doseem to understand why the other gods have feet of clay, so maybe I’m missing something…)My impression is that you are confusing uncertainty and probability. Uncertainty is a feature of reality (or our relationship with it, anyway). Probability is a formal model of uncertainty. Sometimes it’s a good-enough model, and sometimes it’s a lousy one. The inability of probability to distinguish between having no knowledge and having excellent evidence that p=0.5 is an example. (Again, that can be fixed with a complicated extension to probability theory, but this is epicyclic; the same problem immediately reappears at the next level.)

In case it wasn’t clear: I’m not advocating logic as The Answer; as a competing god. It’s a formal tool. I put it on the same level as probability theory: both are useful in some circumstances and not others.

Quite so. But, how do we know what to include and exclude? “Analogous” is doing all the work here. That’s

Marc’s proposal. It seems pretty hand-wavy to me. What counts as analogous? Determining that seems “human-complete.” There is no aspect of intellectual endeavor that it excludes.Whoa, slow down there! Says who? On what basis?

Probability theory

by itselfis definitely not a full theory of learning from experience.The field of machine learning has many different models for learning from experience. Many

useprobability theory. However, none of them come anywhere near human-like performance. In my judgement, incremental progress in machine learning is unlikely ever to produce human-like learning. Some radical new insight would be required, that we can’t now imagine.My

guessis that human learning does depend in part on mechanisms that could be understood in terms of probability theory, but we have to acknowledge that we have very little understanding of how it actually works.What reason or evidence do you have for this?

## Newsflash: there are no gods!

David,

We certainly have much common ground. Hopefully my following long-windedness will not break your blog:

Your clever analogy gives me the perfect opportunity to explain more clearly what probability theory (PT) is. I don’t need to explain the privileged status of my god. PT is just another formal model, like all other mathematical tools. It is necessarily approximate (even at the quantum level), and like all other models it refers only to abstract entities. I alluded to this earlier, when I said that probability distributions do not exist on their own.

It’s what probability is a model of that makes it the king of theories: it is a model of what a rational agent is entitled to believe about the real world. This is why I say that it is a bridge: I could have the most sophisticated physical theory, with the most elegant mathematics ever devised, yet without any basis for attaching credence to it, declaring it to be the truth would be perverse.

The abstract entities that probability speaks of are rational agents. These idealized entities need to know certain things with absolute certainty. This doesn’t describe real objects. (I discuss this in Extreme values: P = 0 and P = 1, which verifies is your apparent feeling that some Bayesian’s neglect the theory ladeness of their art – I sympathize with your wish to correct technically competent people who make such rooky mistakes.)

In particular, the modelled rational agent must be absolutely confident (and rationally so) that some background information (a set of modeling assumptions) is true. Probability provides no means to guarantee this. This, you have correctly identified, but you seem to see it as proof that there must be another theory, waiting to be developed, capable of plugging this gap. What you are asking for is omniscience.

This kind of infallible knowledge is impossible – what would it mean? How would I know that my knowledge was infallible, and not just a mistaken perception of infallibility? The omniscient deity disappears up its own backside, in a puff of logical incoherence.

This uncertainty about the validity of the hypothesis space, etc. is handled quite gracefully by PT, though – if I’m worried that my search space is badly formulated, I just apply the theory recursively, starting from a higher level. (This shows that there is actually an alternative to infallible knowledge, but it’s also impossible: calculation on an infinitely deep hierarchy.)

You say that an ability to recognize useful analogies (correlations) is the hard part of learning, and I couldn’t agree more. But I see no credible reason to even suspect that this is not part of learning by experience. For humans that experience can be assumed to be a combination of personal experience and evolutionary history (hardware predisposed to learn efficiently).

You say that “probability theory by itself is definitely not a full theory of learning from experience.”

Let me be pedantically clear:

(1) I said that PT is the model for learning from experience. By this, I don’t mean that PT describes the microscopic mechanics of brains or digital circuits, or whatever other devices you have in mind. Much less a description of biological evolution by natural selection! When I say model, I really mean aspiration: what would a rational mind, exposed to the same information infer? PT prescribes reasoning that is in a limited way optimal (computational overheads are not taken into account), such that if the output of some real machine differs significantly from a probabilistic calculation, then we should expect that our machine has not got the most out of the data (subject to caveats about theory ladeness – but recall if we suspect that such a difference is accountable reasonably by invoking differences in effective model assumptions, then we can check that without needing further technology beyond probability).

(2) In order for PT to do any work, there needs to be some hardware – a substrate capable of implementing mathematical operations. Furthermore, in order to allow learning from experience, it is obviously necessary to have some mechanism for that hardware to receive input. PT, literally by itself is literally nothing. Apart from these obvious prerequisites, you give no evidence that probability “by itself is definitely not a full theory,” (i.e. an insufficient model for optimal inference) nor any reason to suspect this in the slightest.

Finally, you ask why I am so confident that PT successfully models rational inference. As described above, you cannot remove all uncertainty – probability is reasoning under uncertainty, so that’s a good start. My confidence (actually not 100% confidence, but I have found no credible reason for non-negligible doubt) in probability as the appropriate theory comes from the obvious undeniability of Jaynes’ desiderata, from which he derived the entirety of PT (obviously making use of pre-existing maths). These desiderata are:

To deny any of these, we need to demand things that are obviously perverse, such as that our theory should provide different procedures that give different answers to the same question. These desiderata quite simply describe rationality, which is PT’s stated goal.

Nobody has given any evidence that PT is deficient in any avoidable way for inference under uncertainty. Nobody will ever demonstrate that uncertainty can be fully eliminated (or fully quantified). After a horribly long-winded explanation, these are the simple facts that account for my confidence in the appropriateness of PT.

Thanks for reading (if you got this far).

Cheers,

Tom

## Rapture of the nerds; Ainslie's cure for rationalism

I’m late to this party, but have to chime in because a puckish fate has landed me in the midst of the Silicon Valley LessWrongers, who congregate at the office of my new job. I have had a hard time holding my tongue at some of the nonsense. You are doing a good job of articulating the issues and problems and being reasonably polite, which I admire.

I don’t think the problem is Bayesianism so much as it is an extraordinarily nerdy picture of what life is all about. Bayesianism is a symptom of this more fundamental underlying problem. The nerd-rationalist views thinking as a matter of (a) figuring out true beliefs and (b) optimizing some utility function, which just is not how normal people spend their time.

Not that there is anything wrong with truth and utility and problem solving; they just don’t form a very complete picture; they don’t provide the universal acid of mind that the cultists would like. The overapplication of Bayesianism is just one manifestation of a more general tendency to glorify abstract and oversimplified thinking at the expense of the rest of life. Nerds tend to live in the their brain and treat their body as something alien. The nerds of my era (and I am one myself, pretty much) tended to ignore their body entirely; the new breed treats it as another engineering optimization problem, and applies various diet and exercise fads while waiting for the problem to be solved properly by developing the technology to upload their minds into gleaming robot shells.

I am being too negative. In fact I have a complex relationship with all this stuff, I am simultaneously attracted to and repulsed by the hypernerdishness of the cult of rationality. To rescue this comment from being pure ad hominem nastiness: you said somewhere I think that you didn’t believe there could be a single simple totalizing model of thought, which I agree with of course. The work of George Ainslie has some tantalizing ideas about why this might be the case; not that I buy it or understand it fully, but his theory is that minds are inherently structured so as to be unpredictable to others and to themselves, for solid practical reasons. But it’s real strength is that it is a theory of mind that is rooted in desire and action, not abstract model building. Anyway, I recommend it.

## Probability theory is not an account of learning

Tom— Again there is some important common ground here, but we’re also still talking past each other. I think we are making some progress, so I’m up for continuing!No; exactly the opposite. My main point is that omniscience is impossible, that any such theory seems highly unlikely, and in any case is currently unavailable. We agree about this!

The part of this before the parenthesis, and the sentence in parentheses, seem to contradict each other.

Applying the theory recursively generates an epicyclic infinite regress, I think; it’s unworkable. The parenthesis seems to say that, so I don’t understand what you are arguing for here.

No, I said it was arbitrarily difficult. I’m skeptical that “analogies” is a useful way of thinking about learning. That was your proposal, not mine!

The key disagreement now seems to be whether or not “probability theory is the model for learning from experience.” You say that it does not describe actual human learning perfectly, but that it is the ideal, and deviations from the ideal are due to hardware/wetware limitations.

You don’t seem to have made any argument that PT

isa theory of learning. The burden of proof for that is on you; it’s not my job to disprove it, unless you havesomeargument that it is one.Nevertheless, I have two objections: PT clearly isn’t a model of learning

at all, and hardware limitations matter.If PT were a model of learning, then the machine learning field would consist of a single well-defined problem, plus proposed algorithms for solving that problem. But there is no such definition. We have no formal account of what “learning” is. There are many definitions of formal learning tasks in the literature, all of which are recognized as very partial and inadequate. They don’t add up to a general account of what learning is, nor do they seem like realistic accounts of the situation faced by a real-world learner.

(How, by the way, do you propose PT is a theory of “recognizing useful analogies”, if you think that’s the essence of learning?)

Let’s postpone “hardware limitations matter” until/unless we sort out “isn’t a theory at all.”

The last part of your comment is about PT as a theory of inference, not as a theory of learning. Those two are not at all the same thing.

PT is not a

generaltheory of inference. As a theory of inference, it only extendspropositionallogic to real values. It has nothing to say about logical quantification, which is where serious inference starts. PT is, in fact, incredibly weak as a theory of inference; a non-starter. That’s because inference isn’t its job.PT is the correct solution to the formal problem it does solve, namely probability. What Cox’s Theorem shows is that it is the

uniquecorrect solution to that problem (if you assume various reasonable desiderata). This is altogether unsurprising, and has no significant implications.This formal problem is not reality. Like every other formal system, PT models reality only approximately and abstractly.

Regarding Jaynes’ list of desiderata, “reliability of propositions to be encoded using real numbers” seems highly dubious. Lots of people have given good reasons to think this often doesn’t work. Maybe we are wrong, but at minimum you can’t just assert it as an unquestionable axiom.

David

## Life in Aspergistan

Mike— Ah, your new job circumstances perhaps cast light on your most recent essay, “I and Thou and Life in Aspergerstan”!Yes, definitely. My intellectual interest here is in ideologized rationalism as a

non-theisticmanifestation of eternalism. It’s important because otherwise my critique of eternalism reads as just reiterating standard arguments for atheism. My point will be that the same emotional complex that leads to theism also leads to other eternalist pathologies, such as ideologized rationalism. If the world continues to abandon theism, non-theistic eternalisms will be increasingly dangerous.Bayesianism is a plausible place to start, because it’s relatively simple, relatively easy to critique, and is fetishized by a group of ideological rationalists as the holy symbol of their quasi-religion. It’s not worth critiquing for its own sake, because there’s so little to it. The consensus of LessWrong folks seems to agree with this; some have said explicitly that putting Bayes up front was a presentational mis-step.

Utilitarianism is a more important target, because it’s the rationalist-eternalist’s account of ethics. Eventually I plan to discuss non-eternalist, non-nihilist ethics. (If I ever get time to work on this material seriously again, sigh!) Utilitarianism is potentially much more harmful than Bayesianism. It’s also more complicated. The general approach is: “you acknowledge that no existing brand of utilitarianism works, but you have religious faith that some version somehow

mustwork; why is that?”Yes, me too, obviously!

But more importantly, I really

likesome of the LessWrong folks, a lot, and have formed significant electronic friendships with some of them, without having ever met any of them. I’m actually slightly envious that you are working with them. I miss having smart people to talk to. (Twitter and blogs are the next best thing.)Re George Ainsle: I read his book on your recommendation a couple of years ago, and was really impressed with it. Thank you again! I expected it to reorganize the way I thought about self, and it may yet do that, but so far the impact has been less than I expected. Probably that’s because I’ve had little time for reflection during that period, though. I plan to go back to it and re-read bits when I get a chance.

## This critique of bayesianism is spot on

I have to admit I’ve learned a lot by reading this few blogs and a bit of the book itself. After reading a good chunk of what chapman wrote, I had to admit that I had fallen for a cult.

Which is weird, because I said to myself I recognized the “cultish” parts of LW, and the importance of the language/model. But actually, I now see that I had attributed way too more “weight” to the powers of the bayes rule. It is really just a wrench.

## Not quite sure we are speaking the same language

Hi David

Again, my lack of brevity feels to be bordering on rude. My apologies.

You write:

yet a little later, you complain that,

It is only unworkable if you insist on trying to be omniscient - being absolutely certain that your model is correct. Why, in your attempted critique of probability theory (PT), do you keep drawing attention to the tentative nature of the modeling assumptions, if you accept that no other theory can eliminate this problem?

Apologies for not being clearer. I meant that if we want to try to remove the weaknesses of PT, there are two avenues we might investigate, (1) omniscience and (2) some infinite computation. Both are impossible to achieve, therefore the weaknesses of PT just have to be lived with.

All learning is a process of identifying useful analogies. Important examples include: analogies between an equation and a physical process, analogies between past behavior and present behavior, and analogies between different physical systems that appear to share important traits: this object looks like a human, therefore I expect it to behave like a human, maybe I should say “hello”. Learning is exactly the process that provides confidence that saying hello is a useful thing to do in the presence of such an object.

I think you are still talking about something different. PT is not a “theory of learning” (a description of how real machines acquire information), it is a description of optimal learning, assuming some (unavoidable) assumptions. My argument for this is (1) Jaynes’ desiderata, and (2) the glorious success of the resulting theory, e.g. everything that is good about science can be shown to be an approximation of PT (you’ll perhaps argue that science relies on imagination, but I’ll counter argue that imagination is useless without some means to compare its products to reality, and any efficient imaginative process is very likely derived from experience), and every time science goes wrong can be traced to an important violation of PT.

Wrong, PT is not an efficient way to actually solve all inference problems. Even using PT, a suitable set of assumptions is required to set up each problem – this requires trial and error, subject to the randomness of how nature decides to reveal herself. There is no way around this.

Um, do you want me to start from the beginning again? Recognizing and implementing useful analogies are exactly the functions of PT.

Getting weirder and weirder. May I recommend a dictionary? Am I missing something?

I’m not completely sure what you mean by logical quantification, or why you think PT doesn’t address it. I assume you are referring to statements like the canonical “all men are mortal, Socrates is a man….,” but this kind of reasoning is derived trivially, if I’m not mistaken, from Bayes’ theorem. Please give an example of any situation where learning takes place in a way that PT can’t encompass (honestly, I’m interested).

Actually, Jaynes was quite emphatic that it is not an axiom – there is no attempt to assert its truth, hence ‘desideratum.’ I’m somewhat open to the possibility of a useful extension to PT using complex representation (and I privately predicted that this would be the point at which you would attack my

argumentum ad desideratum), but I have yet to see any strong evidence. PT provides expected frequencies, and I’d need very strong evidence to accept the necessity of representing frequencies as imaginary numbers. I confidently predict that any improvement would be merely a result of making the algebra more efficient. In any case, I further predict confidently that whatever works about PT now will not be significantly changed by such a move – wherever PT justifies making a strong connection between probabilities and frequencies, PT always provides the goods.On the contrary, it is you making the strong assertions, declaring PT to be definitely unable to do the job it sets itself, which you haven’t supported at all.

## Utilitarianism

David,

In your reply to mtraven, you say

I think I understand you here, and if so, there are important ways that I agree, though I’m not sure that your characterization of utilitarianism is strictly correct. I’ll give you my version of non-eternalist non-nihilism (which I call utilitarianism), perhaps you agree with it:

I understand that eternalism here refers to a belief that value is set by some external mechanism, which of course is nonsense. Please note, though, that this is not a necessary (or even plausible) belief of rationalism. I consider myself a rationalist and a utilitarian, but I certainly don’t hold any ‘eternalist’ principle. The word ‘utilitarianism’ may have a narrower meaning in common usage than I am attributing to it, in which case, forgive me (I think that Bentham, for example, did hold some kind of eternalist axioms about how to infer value). In any case, I believe I’m using the only meaning that the word

shouldhave: the value of an action is determined by the value of its outcomes.I accept that very many are confused about the issues here. There is frequently a conflation of the statements “value has no magical origin” and “value does not exist.” The resulting conflict between the obvious fact that value has no magical origin and the obvious existence of value is ‘resolved’ by various tortuous concatenations of logical fallacy (“I don’t want to be a nihilist!” they scream, without noticing the implications of those words). Many professing rationalists do incoherently hold an eternalistic notion of ethics, which you are right to criticize (very much so), but it isn’t a necessary feature of rationalism, in fact the opposite.

Value is evidently a property of minds (and thus evidently exists). Learning about value, therefore, entails inference upon empirical data concerning minds. Inferring what actions will most probably bring us the things we value is similarly an operation on empirical experience. Thus determining ethics is a rational undertaking: to desire something (to attribute value to it) is to desire a means to maximize one’s expectation to achieve it, which is rationality. (We have no omniscient access to the actual value of an action, so we must make do with its expectation.) There is no external or eternal principle here: my values are inside my mind, and when I die, my values disappear with me.

It’s a matter of considerable puzzlement and sadness to me that so many people have difficulty accepting this simple logic, even when it is laid out for them in detail. Far too many scientists, for example, are happy to sit back and say “well that’s the science, what you do with it is another matter, that’s politics,” as if humanity is somehow separate from the ‘normal’ constituents of reality.

## Maybe logic will help

Hi Tom,

You have been scrupulously polite, and if anyone should apologize, it is me to you. And I appreciate your comments, which (though long-ish) are on-topic and thoughtful. I can’t promise to continue this dialog forever, but we do see to be making some progress, and I’m up for a few more rounds probably. However, I’d like to postpone discussion of utilitarianism until we’ve understood each other’s understanding of probability.

I am as baffled by your thinking that inference and learning are the same thing as you apparently are by my thinking that they are different. Prototypically, inference is deductive and learning is inductive. Entirely different things. Not all usage conforms to the prototypes, of course.

My hypothesis is that you think both inference and learning are fully accounted for by PT (in fact you pretty much say this), in which case they would be the same, because two things equal to the same thing are equal to each other.

However, outside the Bayesian bubble, PT accounts for only a small part of inference, and only a small part of learning, and they are nearly disjoint subjects!

It appears that you have not studied formal logic, so it may be helpful for me to explain some basics. I apologize if this is, in fact, familiar to you.

Formally, the Socrates example has the givens ∀x.man(x)⇒mortal(x) and man(Socrates). There’s two steps to the deduction. The first, officially called “instantiation”, infers man(Socrates)⇒mortal(Socrates) from ∀x.man(x)⇒mortal(x). (This “instantiates” the variable x as the constant Socrates.) The second step, which is

propositional inference, derives mortal(Socrates) from man(Socrates) and man(Socrates)⇒mortal(Socrates).Probability theory extends

only propositional inferenceto real values. It has nothing to say about other sorts of inference, such as instantiation.Now, you might say, “well, that’s trivial! We don’t need a

theoryof that!Obviouslyif every man is mortal, then if Socrates is a man he’s mortal!”But, once you allow instantiation, logical inference is

infinitely difficult—uncomputable and then some. Once you allow instantiation, inference encompassesall of mathematics. Clearly, probability theory has nothing at all to say about, for instance, differential equations.To start to see why logical inference is a door into vastness, note that there are an infinite number of instantiations possible.

In general, inference requires an unbounded number of instantiations, and choosing them can be highly non-obvious (in fact uncomputably hard).

This is related to the problem of choosing a hypothesis space before probability theory applies. Remember my discussion about the hypotheses that each grain of sand in the universe is the sole cause of manic depression.

I think it’s really important that we agree that probability theory is not a general theory of inference, as a step toward getting a handle on what its actual job is. If you still don’t understand why it’s not, we’ll have to go more into first-order logic.

My point is that PT doesn’t address the issue of where the model comes from

at all. Model formation is pre-formal, and PT doesn’t apply. We can’t have a perfect theory of model formation, but we can say lots about it. What we can say will necessarily be non-formal.My point is not that PT fails to do its job. My point is that its job is a restricted one, and you appear to be trying to make it do all sorts of things that are outside its proper scope. You are also overlooking important intellectual activities, perhaps because PT doesn’t apply to them.

This quote suggests you misunderstood the point I made just above. I am not concerned (here) with weaknesses in PT when applied to its proper domain. I’m pointing out that there are important sorts of intellectual work that are entirely outside its scope.

Now, about “learning.” PT is a formal system. Therefore, if it addresses “learning,” you should be able to say what “learning” is formally. What is your formal definition for “learning”? (Until we establish this, we’re going to talk past each other about that.)

I

suspectthat your definition of “learning” amounts to Bayesian updating. However, that is not what most people in cognitive psychology, developmental psychology, philosophy of mind, and AI mean by “learning.”If you redefine “learning” to mean “probabilistic inference”, then probability theory is a pretty good theory of learning! However, hardly anyone is going to take this seriously as “learning” outside the Bayesian bubble.

Re analogies. There is an extensive literature on analogy in several fields of cognitive science. I suspect you know little or nothing of it. Unless or until you do, anything you say about analogy is going to be naive. At risk of being rude, I think you are probably out of your depth here.

This seems extremely unlikely. “Everything that is good about science” is a very vague and broad category, and reducing it to a simple formalism does not seem feasible. This sounds more like a religious tenet than a rigorous, testable truth-claim.

Re Jaynes’ desiderata, you called them “obviously undeniable.” I deny the first one, as do many other people. That’s what I meant when I said “you can’t just assert it as an unquestionable axiom.” (Maybe “axiom” was a poor choice of words.)

## Utilitarianism

I’ve found that utilitarianism works for practical purposes - yes, there are all kinds of weird corner cases like “utilitarianism says we should re-wire everyone’s brains for maximum pleasure” or the various paradoxes that it produces in population ethics… but I don’t particularly care about those, because I’m not in a position where I could re-wire everyone’s brains, so the question of whether or not I should do so is irrelevant. In the kinds of ordinary situations that we tend to encounter in real life, utilitarianism works fine, and that’s enough in my book.

## Non-statistical learning

I held off on answering this question, in hopes that we could come to agreement first that PT doesn’t encompass predicate logic. But, on consideration, maybe it’s useful to give an example now.

A typical example is grammar induction. This is a formal model of the problem faced by a human learning a language. You are given a set of sentences in some language you don’t know (Chinese, let’s say); your job is to find a “good” grammar that produces those sentences. A grammar, formally, is a set of production rules; if you’re unfamiliar with that formalism, you can think of it as a program. There’s a trivial solution, which is the grammar that produces all possible sequences of words. That doesn’t count, so there’s some sort of formal “goodness” criterion, which says “yes” or “no” to candidate grammars. The grammar has to meet the criterion

andgenerate all the given sentences.The number of possible grammars is infinite, and in practice it grows extremely rapidly with the number of rules you include. The main job of any algorithm for this problem is to control that combinatorial explosion. It will propose rule sets and then evaluate them to see if they cover the data and meet the goodness criterion. Ways of guessing plausible rules are critical.

It’s conceivable that probabilistic/statistical methods could help guide that process. You could consider proposed rules “hypotheses” and score them probabilistically. I don’t know the current state of research in this area; when I last checked 20 years ago, that wasn’t the approach anyone took.

However, even if probability had

somerole to play in this, it would have to be a small one. Most of the smarts have to be in the rule-proposing algorithm.This is a formal case in which it’s clear that hypothesis generation is most of the work.

## Probabilistic language acquisition

Probabilistic models of language processing and acquisition is one relevant overview, though it’s back from 2006.

## Probabilistic Language Acquisition

Suppose that instead of a corpus of known grammatical sentences, we have a corpus in which almost all of the sentences are grammatical, but mistakes are possible.

Then it seems obvious that we should call in probabilistic reasoning, and that the case where all of the sentences in the corpus are known to be grammatical is a special case where the probabilities are all 1s or 0s.

I do agree that the benefit from making the hypothesis generation system clever is larger than making the scoring system clever. I don’t have much trouble seeing this as an approximation of PT, though: you’re guessing where the probabilities will be high, and then looking there to check.

I do wonder if this is mostly a discussion over terminology, and if so, to what extent refining the terminology is useful. This is, I think, a reflection of my seeing “probability theory” as a large component of “reasoning under uncertainty” which is deeply connected to the rest of “reasoning under uncertainty”. The deep connections make it hard to draw crisp lines separating disciplines, and make me pessimistic about the benefits of attempting to draw crisp lines. The question of whether tractable approximations of PT fall under PT sounds to me like the question of whether Machine Learning is a branch of statistics and I have a hard time caring about both questions.

To use an example from decision analysis (DA), the scope of DA goes from a problem description in natural language to a recommended action, but this covers a wide range of obstacles and techniques.

The easiest part of DA to discuss is the math, specifically expected value estimates and VNM-utility,

becauseit’s math and thus easily formalizable. A hard part of DA to formalize is how to make formal models of real-world situations, and to populate those models with probability estimates from real-world experts who are generally unfamiliar with probabilistic thinking. Another hard part of DA is determining what the objective function actually is (decision-making undercertaintyis often hard!). An ever harder part of DA to formalize is how to look at a situation and recognize alternatives which may be better choices than any of the known choices.And so when someone reads about expected value estimates and sees a bunch of neat math which presumes neat inputs, they says “DA doesn’t solve my messy problem!”. Well, VNM doesn’t solve your messy problem- but it does point towards the

rightparts of that messy problem to solve (namely, hunting for a procedure that gets the necessary neat inputs from your messy problem). Instead of endlessly listing pros and cons, trying to compress different values onto the same scale makes decision-making much easier. DA as a wholedoeshave messy parts, and so DA can solve messy problems.I don’t think it makes sense to walk around saying that VNM-utility solves all decision-making problems. VNM-utility is a useful component of decision-making, and a good guide as to what techniques should and shouldn’t make it into your decision-making toolbox, but it’s not the entirety of DA. So if your point is “these particular formulas aren’t everything you need for reasoning under uncertainty,” then I think we’re in agreement, and I think the claim you’re objecting to is meant to be interpreted as “formal reasoning under uncertainty is the overarching framework which all thought should fit in (often as an approximation).”

Galileo comes to mind: “Measure what is measurable, and make measurable what is not so.” The growth in usefuless of DA as a field is that people have recognized the parts that are not formal yet, and worked on formalizing them. Elicitation of probabilities is in much better shape now than it was 20 years ago, and I expect that in another 20 years it will be even better. If hypothesis formulation is where probabilistic epistemology seems weakest today, well, roll up your sleeves and start making it

less wrong.## Probabilistic learning

So I looked a bit more for theories of probabilistic learning and found, among other things, a paper named How to Grow a Mind: Statistics, Structure, and Abstraction. I’d be curious about whether or not David feels that it contradicts his claim that probability theory contributes nothing to the model generation part.

In particular, the section from the subheading “The Origins of Abstract Knowledge” onwards argues that hierarchical Bayesian models can discover the best structural representations for organizing the data that they perceive:

## Less Wrong draft

Kaj, Vaniver—Thank you both for the follow up. I’ll need some time to read the papers and respond.

Meanwhile, I’ve drafted the first of a sequence of articles on limitations of probabilistic reasoning for LessWrong. I’m hesitating about whether to post it, because it’s a bit technical and I’m not sure there aren’t embarrassingly stupid errors in it. Specifically, I pose a decision theory paradox, and then use the Ap distribution from Jaynes’ chapter 18 to resolve it. I don’t know whether this paradox is commonly known in the literature, or if there’s some other obviously better way of dealing with it.

Would either (or both) of you be up for reviewing the draft? If so, we can figure out some way of getting it to you privately.

Thanks in advance!

## Floating obliviously

Hi David

I accept that inference includes matters of mathematics and logic, and that these are not contained in PT. Further, I’m happy to concede that PT is incapable of anything without these necessary components. I consider inference to be the drawing of conclusions by reasoning from assumed premises. Usually, when I think of inference, I naturally think of inference as being

about something, which excludes pure mathematics and logic. If I’ve inadvertently claimed PT as a general theory of inference, this was mistaken, and not my intention. But I stand by the claim that to attempt to draw inferences about the real world is to attempt to at least replicate the function of PT. You might be the world’s top expert in differential equations, but without some capacity to make some at least approximate probability assignments, those equations will allow you to say nothing informative about the real world. Nothing, for example, that will enhance the effectiveness of your decision making. This is my central claim, and that (I presume) of the prototypical Bayesian that you criticize.You were right that I’m not well acquainted with formal logic, so thanks for the explanation of instantiation. Forgive my ongoing ignorance, but what you’ve described seems to be achievable with PT: given P(mortal | man) = 1, and P(man | socrates) = 1, it trivially follows that P(mortal | socrates) = 1. Anyway, I’m not sure that logic is really the topic of this debate.

The problem that you describe as emerging from the process of instantiation: logical inference becoming “infinitely difficult – uncomputable,” … I really don’t understand how you can cite that as evidence for the shortcomings of PT. You seem to link it to hypothesis generation. Ultimately, the problem of hypothesis generation must come down to trial and error. To get started, we just need to assume some form of symmetry in nature. This assumption is

a-priorisound (actually this doesn’t matter), as without symmetry, there can be no physics. Going from naïve trial and error to some kind of guided trial and error is itself made possible by learning from experience.Please don’t be in the least concerned about being rude. We can proceed better by being clear.

I freely admit that I’m ignorant of most of cognitive science, which no doubt has many different classifications of analogies, and models of how they are generated and manipulated in minds, but, floating obliviously above a bottomless ocean, I’ll stubbornly stick to my guns that the symmetries that PT investigates are exactly analogies. Stanford Encyclopedia of Philosophy seems to understand what I mean.

I feel like I’m repeating myself a lot, but if we do this in enough ways, maybe we’ll come to understand each other.

Are you claiming there are things you can say about model formation that are examples of things that can be learned about the real world in a way that isn’t capturable by PT? Please expand!

This was supposed to be an approximate statement, I’m not claiming a rigorous proof. I mean that we can look into many of the elements of scientific method and find this property of striving for the Bayesian ideal as common to all. Again, if you can offer a counter example, that would be great. This would be your strongest argument, but so far, it remains lacking.

Then it would be a service to me to explain why you think frequencies are best modeled as complex numbers.

## Probabilistic language acquisition

Kaj, replying to your first comment on this topic:Thanks for the 2006 review article. I read it pretty carefully, and it mentions only one paper that is relevant to grammar induction, their footnote 47, which is Klein and Manning (2004). I’ve skimmed

thatpaper. It’s pretty complicated, and I don’t want to do a proper analysis unless something important turns on it. The upshot is that they use probability,which I think is a good idea, but the algorithm is complex and there’s a lot of other stuff going on.I introduced the topic of grammar induction because

Tomasked for “any situation where learning takes place in a way that PT can’t encompass.” I assumed that by “encompass” we’d mean “can do the whole job.” PT doesn’t do the whole job in the 2004 paper, although it plays some role.## Tenenbaum paper

Re the Tenenbaum et al. paper:

First, they’re combining probabilistic and symbolic methods; this is explicitly not a pure-Bayesian approach.

Second, I think this is a good idea. The final pieces of work I did in AI were of this sort, motivated by the same arguments they give in this paper. I spent about a year combining stats with symbolic stuff in various ways before concluding it wasn’t going to work. However, if you held a gun to my head and insisted that I try to make progress in AI today, that’s where I would start.

Third, 20 years later, I still don’t think it’s going to work. The case studies are “toy examples”, and I doubt the methods will scale. (Of course, one can’t be certain about this.)

Last, about your actual question… does it contradict my claim that probability theory contributes nothing to the model generation part. No, it doesn’t, but I should clarify what I meant by that.

In their toy examples, all the work of formalization has already been done. The input data are highly abstracted (a small number of strings, each of only about a dozen bits, in fact). The space of possible models is fully defined by a specific simple formalism. What remains is to choose one model out of that space.

Where I’m denying PT (or any other formal method) is applicable is in

problem formulation, i.e. turning real-world situations into a formalism.## Is this just definitional?

Vaniver—You raise several points… First, though, thank you very much for reading my draft! I’ve incorporated all your suggestions for improvement.

Re your first three paragraphs. Any problem at all can be expressed as “produce an x such that P(x)”, for some predicate P. For instance, “produce an x such that x^2 = 8,618,632,628,501,281”, or “produce an action plan such that executing it brings about world peace.” Then we can generalize this probabilistically: “produce an x such that P(x) holds with probability > 0.999”. For instance, the probability that 92,836,591^2 = 8,618,632,628,501,281 is better than 0.999. To solve the planning problem, you can just enumerate all possible courses of action, and check each until you find one whose probability of producing world peace > 0.999.

I hope it’s obvious in these cases probability isn’t doing any work at all, even though I’ve stated the problem probabilistically. So, the fact that you can wrap “with high probability” around the grammar induction problem doesn’t mean that probabilistic inference is a good way to approach it. (Although I think it probably

is partof a good way to approach it; whereas it’s not a good way to find square roots!)That’s worth exploring! However, my fundamental point is that PT is a formal method with limited applicability, has no magic special status among formal methods, and is incapable of addressing non-formalized problems. From the following, it looks like maybe you agree:

We could go on to discuss the question of exactly what PT’s domain of applicability is. But that’s partly what my draft LessWrong series is about; and maybe it’s better discussed there when/if I post that.

Yes—these are much the same sort of “problem formulation” issues I alluded to in the OP! I find them very interesting.

## Probability Theory - 'The facts on the ground'

Here are ‘facts on the ground’ about the current state of probability theory:

-Despite a huge surge in popularity of Bayesian methods since the 90s, there have been no spectacular breakthroughs in artificial general intelligence. Despite tens of thousands of the worlds’ best and brightest researchers in cog-sci and stats deploying Bayesian methods consistency for more than 20 years, these methods have produced no spectacular breakthroughs in scientific progress; there is no sudden surge in Nobel prize winners deploying Bayesian methods, and there is no evidence that AGI is imminent.

-The main success of probabilistic methods to date has been prediction; in cognitive science the main success has been the Memory-Prediction framework as popularized by Jeff Hawkins (‘On Intelligence’, 2004). But as top physicist and science philosopher David Deutsch comprehensively explains in his poplar books (‘The Fabric Of Reality’, ‘The Beginning Of Infinity’), prediction is not the primary role of science (the central role of science is ‘explanation’ not ‘prediction’). The idea that all of scientific inference is an approximation to probability theory is an article of faith. As AI researcher Ben Goertzel argues, formal Bayesian methods have no broad track record.

-There is no doubt that

someof what the brain does in inference could be described as approximating probability theory. But there is still much to be learned, the idea thateverythingthe brain does is an approximation to probability theory is an article of faith.-Probability theory can’t handle mathematical reasoning. On-going research at MIRI has not yet shown that it can be generalized enough to deal with mathematical reasoning. Claims that it can be done are articles of faith.

-Model formulation and knowledge representation is a key problem which probability theory has not yet got to grips with. Claims that ‘hierarchical Bayesian models’ and ‘model checking’ etc., will be able to solve these problems are articles of faith.

## Re: Tenenbaum

David, thanks for your comments on the Tenenbaum et al. paper. The way they described their results made it sound like their models would be a lot more general than what you say, but I guess that’s always the case with AI. And all of the most impressive results are always on toy problems that don’t scale. (SHRDLU, anyone?) :)

Actually, I should probably have dug into their references to check for the scale of the examples, myself, before bothering you with it. Sorry about that.

I just contacted you with the Approaching Aro form.

## Probability, inference, and the real world

Tom—Our backgrounds are different enough that we really do have difficulty understanding each other. Interesting.

This seems like a simple contradiction to me. What in the following analysis seems incorrect to you?

canuse differential equations to make inferences about the real world. E.g. we can estimate temperatures using the heat equation.Maybe your point is that there is always some “slop” between a real-world object and its formal characterization in terms of the heat equation. I think this is a point we agree on violently; it’s one of the main ideas I want to get across, too.

Maybe where we primarily disagree is that you think probability theory necessarily captures that slop, and I think it necessarily doesn’t, and we both think this is important.

What?? I never said anything at all like that.

## State of the art

Marc—Thank you very much for that summary!

I agree with it, and it condenses several of the points I’ve tried to make at greater length.

(I don’t know the work of Hawkins, Deutsch, or Goertzel, so I don’t have an opinion on that particular paragraph.)

## I've forgotten what we're arguing about

Only by making use of probability theory, or some surrogate! Differential equations provide no flow of information about the real world - they only refer to abstract entities. Differential equations allow me to learn things about x’s and y’s, but those x’s and y’s don’t exist - not even in your head, they are at best only represented in your head.

As I have said, I don’t think this. for example, every hypothesis in our search space might be false - and in fact “probably” is (!). We could even be wildly wrong (e.g. a simulation hypothesis is true, but we never come to suspect this). My point, which is supported by highly compelling theoretical arguments, is that no other process, in principle, can handle that slop better than PT. In fact, the elegance with which PT (and its many approximations) allows us to escape complete epistemological crisis is a wonderful thing, entirely.

Now, I have sympathy with your objection that statements about the supremacy of PT sound like a quasi-religious faith, but lets be practical. I accept that there is a non-zero possibility that the Earth doesn’t go around the Sun (e.g. simulation hypothesis, or simply, dirty lying astronomers), but I don’t need a syllogism employing 100% known premises to say confidently in a loud clear voice: “the Earth goes around the Sun”. Nobody reasonable will accuse me of blind faith for that. Even a syllogism, by the way, won’t prove that I’m not insane - any demands for absolute proof are simply misguided. Give me a reason to doubt my position, though, and I’ll listen.

Re complex numbers: I thought you said that you deny the assumption that probabilities should be represented by real numbers. Since probabilities model frequencies (though not in the naive way thought by many frequentists), I was curious to know why you think frequencies might be better modeled as complex.

## Going around in circles

Tom— We do now seem to be repeating the same points without making sense to each other. Maybe we’re approaching the limit of what’s possible through dialog.No, I denied “the reliability of propositions to be encoded using real numbers.” There’s two misunderstandings here.

First, I absolutely would not suggest using complex numbers instead. I’ve no idea what that would mean.

Second, “reliability” and “probability” are not the same thing. Probabilities are real numbers by definition. Reliability may or may not best be encoded as a real number, depending on circumstances.

My impression, again, is that you are systematically confusing uncertainty (a feature of the real world) with probability (a formalism).

## PT and inference

David,

I have no idea of whether this is what Tom has in mind, but it occurs to me that there could exist at least one way of reconciling “differential equations allow you to make inferences about the real world” with “to attempt to draw inferences about the real world is to attempt to at least replicate the function of PT”.

It involves kind of treating differential equations as a black box. In essence, you ask, “I have this particular tool that I have previously used to make predictions about the world; how certain am I that this tool works for that task, either in general or in this particular situation?”. Then you look at the track record that differential equations have, and use PT to estimate the probability that you will get the correct result this time around.

One might argue that (in a non-conscious and informal way) human children learn to do something like this: they experiment with different ways of learning about the world (asking their parents, asking their older siblings, asking that uncle who likes to pull their leg, trying things out themselves, taking the stuff that’s said in comic as gospel, etc.). Some of those end up providing better results, some of them provide worse results. And (some approximation of) probability theory is ultimately used to figure out which of these methods have worked the best, and which should be used further.

Of course, in real life we also rely on all kinds of different logical arguments and causal theories instead of just looking at the track record of something… but one could argue that the reliability of different logical arguments and causal theories, too, is verified using probability theory.

Again, I don’t know whether this is anything like the thing that Tom has in mind, but it would seem like a somewhat plausible argument for subsuming all inference under PT…

## Another critique of bayesianism (one a little bit less "nice")

http://plover.net/~bonds/cultofbayes.html

This critique strikes in a different angle at bayesianism. It’s not very nice, but I think it’s spot-on too.

I have to admit… I’ve been addicted to lesswrong in the past. No matter what I think now about the actual power of bayes, probability theory, and what I’ve always thought about singularity and cryogenics (a delusion), I think the website has a lot of good ideas, many of them lacking sources, but good nonetheless. Those good ideas tend to add up confidence, and the halo effect applies in full force. I also have to admit it’s really easy to attach to bayes powers that the theorem does not actually have. For example, I have explained some deductions I’ve made with the bayes theorem, for example, figuring out the sexual orientation of people from small characteristics that people with certain orientations tend to have: It’s true that bayes explains why this isn’t a logical fallacy, but the real reason I figured those things out was entirely intuitive pattern matching of the sort everybody has.

There also many bad ideas. I think the worst is the negation of the importance of politics. I’d wager there is no such thing as an apolitical person, and none of the LW are actually apolitical, rather, they are in general conservative: The fact that people seem to think that the best thing we can do with “rationality” is “optimizing charities” is really telling.

## What are the x's and y's?

Kaj,

Surprised this needs clarifying. I suppose its double evidence that I haven’t communicated effectively.

Confirming the formalism (or usefulness) of differential equations, as a field of mathematics, is not really what I had in mind, though you are perfectly right, it is part of it. (Arguably, this normally works the other way around: we choose the axioms that seem to fit our experience, and derive the formalism from them - in a way, by the time the maths is developed, much of the confirmation has already taken place.)

I meant more, though, that to say anything about the world using differential equations, I must know the form of the equations that effectively model the process I’m interested in, and (depending on the nature of the problem) I have to know a good candidate set of coefficients. No amount of mathematical brilliance will jump this gap, if there is no empirical data to work on. PT, and its surrogates, are the only tools translating empirical experience into information that can be used in this way.

PT bridges the gap from x’s and y’s to things in the real world. It’s not perfect, but it’s the best we can hope for.

Hope this is clearer.

## Cult of Bayes

Niv—Thanks for that link. I agree with a lot of what he says, although not everything. Definitely anyone involved with LessWrong would do well to read it.

For the sake of completeness, Alexander Kruel has also written extensive critiques of LW at http://kruel.co/2012/07/17/miri-lesswrong-critiques-index/ , with a separate collection of LessNice stuff at http://kruel.co/mirilesswrong-mockery-index/ .

On the whole, I think Kruel’s critiques are more careful and accurate than the “Cult of Bayes” article, although I don’t agree with everything he says either.

One thing that makes this difficult is that Bayesians are diverse, ranging from (perhaps) evil insane people to kind reasonable people. And people’s views also change over time, so even individuals are moving targets.

There’s maybe some Bayesians whose position is just “probability is sometimes useful and the Bayesian approach to statistics are sometimes helpful,” and I’m in agreement with that. And there’s maybe some whose position is “we need to grab control of the US government in order to establish a rationalist utopia,” and I’m completely opposed to that. (These two positions may both be more extreme than anyone involved with LW actually holds; I don’t know.)

The point of my first post was just that I’m struggling with how to approach this, and whether or how I could be useful. I’m still not sure about that. Currently I’m drafting a series of

technicalarticles about thetechnicallimitations of Bayes, to post on LW, since maybe that’s something LW readers can hear. But I’m not sure whether this is worthwhile.## Probability theory and practice

Nice quote from the gospel of Jaynes, chapter 18:

## Nice quote

How is this different to what I’ve been saying all along? There is a hypothesis that I’m afraid I’m having to assign ever greater weight: that you have not stopped to honestly scrutinize either your position or the one you criticize.

## How is this different?

Tom—I understand you as saying that any formalism

otherthan PT can only be connected to the real world by using PT. PT has a special status, among formalisms, by being the only one that can do that.My understanding is that Jaynes is saying that PT doesn’t do that

either. Maybe that’s not what he meant. In any case, I don’t think PT has that special status. It gets connected to the real world the same way any other formalism does.## Best possible

PT provides the

best possibleconnection to the real world. Jaynes doesn’t contradict this. You seem to be stuck in a false dichotomy. I never claimed that the connection would be free of uncertainty. In fact, I explicitly contradicted this position, several times. PT allows management of uncertainty.## with all due respect tom...

with all due respect tom… you seem to be repeating that David just does not get it. but perhaps it is you that do not get it.

every cult or cult-like belief system has what I call “defenses against opposing beliefs”. for example “the devil confuses the heart of the people against god” is one. lesswrong defense is “im smarter, therefore im more likely to believe youre wrong rather than me”

## pt does not give you the best

pt does not give you the best possible connection to the real world. pt is unconnected to the real world, its up to you to make the connection between the real world and the mathematical model!

for example… I think pt would aid me if I said that I find unlikely that SI actually can reach singularity given that they have done nothing to actually advance AI in baby steps, constructing something marginally useful today. why? because I think that if people at SI actually did know their shit, then it would be probable they could and would have done it. the fact that they didnt is a good sign that they dont. but that’s my model, right?

## Polya

Have you read Polya’s “Mathematics and Plausible Reasoning”? It overlaps with these ideas quite a bit, and breaks things down to pretty specific things you can actually do. It’s a nice book.

## Polya's "Mathematics and Plausible Reasoning"

Wow, thank you very much! No, I didn’t know about that. I do know his How to Solve It, which was a classic when I was a math student. Mathematics and Plausible Reasoning was apparently first published after I left the field. It looks really good, from the table of contents on Amazon!

## An idea I've been developing concerning informal reasoning

I could bore you with the details of how I came to think about this, but I thought I’d get right to the point about how I feel humans reason things out.

Brains can, by ‘design’, only hold a finite number of ‘things’ at any given time. The way we typically get around that limitation is by grouping these ‘things’ together and then referring to the grouping as a ‘thing’, thereby allowing us to reduce the number of ‘things’ held in mind at any one time.

I call these things elements of ‘context’. The defining aspect of a context is that it fits neatly into our minds and can be worked with using various ‘tools’. I worked out that all of these ‘tools’ can be boiled down to a short sampling of ‘verses’. I have identified four primary verses. Inverse, converse, reverse, obverse. They kind of fit together like yin and yang. The inverse is the stuff ‘around’ the context, the reverse is all the stuff that’s not the context. The converse is the part of the reverse that relates somehow to the context, and the obverse is the rest of the reverse of the context.

All words we use invoke the verses of context in various ways. The verses are the basic building blocks of meaning, they are how we construct reality. You can determine how smart someone is by how fluidly and usefully they can model a problem in ‘contextual space’ and then apply forms of reasoning, which again boil down to complex applications of context verses, hereby called ‘logic’, to get somewhere nobody’s really thought of yet.

I’ve put a lot of thought into this over the last few years, and intend on eventually putting together a web application that’s similar to mind-mapping software, but organized around verses. One might choose to represent the “United States” in context logic. The first step would be to identify various elements of “United States of America”. So you might start with the states. Eventually you might realize that you need “The Star Spangled Banner” in there too. So you group the states together as elements of “States of the United States of America”. So “USA” has one element “States”, which itself, as a first class element of context in its own right, has fifty elements of its own. Elements are part of a context’s inverse.

It’s a contrived example, but it doesn’t have to be. Once you start exploring the converse and the obverse, things can get interesting quickly. What are things that are not the USA? Well, the rest of the countries, of course. And then the converse might be, “the relationship of the USA to other countries.” You can take each of these definitions, and start throwing things into the elements buckets. What are elements of the relationship between the United States and Mexico?

Eventually you might find that, subconsciously, you’re actually exploring a contextual ‘world’. Say, “international relations of North America”, and the entire graph you’ve constructed can be titled such, and now you have a complex element of context such that humans typically reason informally with. Then you can proceed with the context defining with that topic in mind.

Our usual way of communicating information is through a device we call a narrative. Blog posts, articles, encyclopedia entries, and the like. It’s all about reducing the contextual elements to that which is important, and then presenting them in comforting ‘story mode’. But there’s nothing really unimportant about the things we omit for the sake of a narrative. A logic graph offers, in my opinion, a tantalizing ‘explorative’ mode of interacting with knowledge, like browsing Wikipedia, but also allows you to display, graphically, the crucial aspect of how things relate to each other. Narrative formation can make it difficult to determine what’s really the most important aspect of the story. What the story-teller intends to communicate is often very different from what the person hearing the story takes away from it.

But a knowledge graph, interspersed with links to narrative-style content like Wikipedia Pages, allows someone to represent contexts as he or she sees them, as directly as possible.

## A follow-up page

A new page, “Probability theory does not extend logic,” addresses some of the issues we discussed in this comment thread.

It shows why probability theory is not a complete theory of rationality, and explains Cox’s Theorem in some detail.

## Jerry Weinberg

The stuff about problem formulation vs problem solution, the “anvils” and “collect your bag of tricks” reminds me of what Gerald Weinberg teaches in the experiential workshop

Problem Solving Leadership,and hisSecrets of Consultingbooks (particularly the second one, with the idea of creating a personal toolkit of problem-solving strategies, building on the work of Virginia Satir).## Feynman quotes

I was wondering if the Feynman cryogenics quote actually came from his visit to a general relativity conference? I mean, I wouldn’t be surprised if Feynman ranted to his wife about more than one conference, but it fits your description quite well:

More here…

Interestingly the field had a bit of a renaissance not long after, despite the lack of experiments (sometimes known as the “golden age of general relativity”.) It’s still doing pretty well for itself… that quote was from GR3 in Warsaw, and at GR21 in a few weeks the big news will be the recent discovery of gravitational waves.

Oh and while I’m spouting Feynman quotes, the version I remember of the ‘bag of tricks’ thing was from

Surely You’re Joking…, where he talks about his ‘box of tools’ for integrals:## Feynman quotes

Yes, I think that’s probably the one! Odd that I misremembered the field.

That’s a bit embarrassing… unless there was an exogenous intellectual shock?

Ahah! I

thoughtI’d read him saying something like that.Thank you very much for the quote!

## Assuming the 100% rational...

I have a few questions for Tom…

Can Bayesian, or probability theory, etc… account for human

irrationality,confirmation biasetc… in thinking?Also, can these probabilistic ways of thinking account for the presence (and the sum) of knowledge gaps? (As in, not one single human possesses all-encompassing knowledge about everything)

Those variables seem to factor in unknowns, and how can you quantify an unknown to get an accurate probability, when thinking?

To me, these thinking tools are just that,

tools.Each thinking tool has a purpose, but is limited to that (or those) purpose(s).Just because you can use a hammer for more than one purpose doesn’t make it the best tool, or the right tool for the job. There isn’t a tool that does everything.

## I know almost nothing about

I know almost nothing about maths beyond high school level (I’m just here for the Buddhism) but… I think I know the Feynman quote re: genius tools.

Dan Dennet quotes it in ‘Intuition Pumps: And other thinking tools’.

I think Fenyman is the colleague and Von Neuman the genius. I like this because Fenyman is doing what your suggesting re learning other people’s styles and tricks.

“A colleague approached one day John Von Neumann with a puzzle that had two paths to a solution, a laborious, complicated calculation and an elegant, Aha!-type solution. This colleague had a theory: in such a case, mathematicians work out the laborious solution while the (lazier, but smarter) physicists pause and find the quick-and-easy solution. Which solution would von Neumann find? You know the sort of puzzle: Two trains, 100 miles apart, are approaching each other on the same track, one going 30 miles per hour, the other going 20 miles per hour. A bird flying 120 miles per hour starts at train A (when they are 100 miles apart), flies to train B, turns around and flies back to the approaching train A, and so forth, until the two trains collide. How far has the bird flown when the collision occurs? “Two hundred and forty miles,” von Neumann answered almost instantly. “Darn,” replied his colleague, “I predicted you’d do it the hard way, summing the infinite series.” “Ay!” von Neumann cried in embarrassment, smiting his forehead. “There’s an easy way!””

## Summing the infinite series

Thanks, that’s a great story!

## Box of tricks again

You were right that Rota also had a ‘box of tricks’ type piece of advice.

I was trying to look up some other Rota quote that I half remembered and found this (spam filter didn’t like the link but should be easily googleable):

Why is always either Rota or Feynman that says one of these things? I wish it was more common for people to talk about their work like this.

## The taboo on explaining mathematics

Yeah, this is really puzzling.

I have a guess (which is just a guess). It’s that everyone knows that the early 20th century foundational crisis was never actually resolved. It’s just that it’s been most of a century since it caused any practical problems.

And no one wants to risk getting anywhere near it. Everyone knows that real numbers have actual infinities (Cauchy sequences) hidden inside, and no one wants to talk about

that. Measure 1 of real numbers are uncomputable: [literally] Unnameable Horrors, with squirming rugose tentacles and [literally] countless scuttling legs. If you thought on that too long and too hard, you could open a portal to the Dungeon Dimensions, and shadow monsters from behind the stars would seep through and tear terrible rents in the fabric of spacetime [or at least R^3].Mathematicians, like everyone else, just want to get on with their work, and not have to think about the Big Picture. If you are working in some subsubsubdomain of low-dimensional functional analysis, the last thing you want to do is worry about what real numbers are. But if you start

talking aboutwhat you are doing, and how, and why… nightmares rush in.## Rugose

I learned a new word!

Interesting… I would never have considered this explanation, but then I’ve managed to stay amazingly ignorant of the foundations of mathematics (yes I know this is bad). I might read that Chaitin paper you linked on twitter though, thanks for the mention there!

I might tentatively favour a similar idea, though, which also has its tentacles reaching out of the early 20th century… Everyone knows now that mathematics should be Very Logical and Rigorous. Unfortunately the stuff that actually comes into your head while doing maths is not like that (for me at least…), so it’s easy to worry that people will pick it apart for not being rigorous enough.

Don’t know why this would stop all the physicists though, we don’t care quite so much.

I think the situation has got a bit better with more casual blogs where people try and explain how they’re actually thinking, rather that what they tidy up for their papers. John Baez’s

This Week’s Finds in Mathematical Physicsmust be the best example, but Tumblr can be surprisingly good because it’s full of procrastinating grad students. (But also it’s Tumblr, so you have to pick through all the other rubbish.)Looking forward to more on ethnomethodology too btw. Another new word for me, even though it’s obviously very relevant to what I’m interested in!

## Very Logical and Rigorous

The Chaitin thing is a talk transcript, not a paper; it’s short and informal. He comes across sounding a bit like a crackpot (although everything he says is in fact totally mainstream), so that’s kind of fun.

Yeah, and this is Chaitin’s topic. Everyone “knows” this because Hilbert (and Russell and others) tried to remake math to deal with the squid-headed Elder Gods that were spilling out of Cantor theory. They figured they could repair the hole in reality. They couldn’t, but the squids got bored and went home. And probably now we could stop doing these fastidious rituals to propitiate them!

But we did get computers out of the deal, so maybe that was good.

## Nothing happening

I find certain enjoyable hilarity in the conversation. Thank you!

Reading this, I realized that I do not often actually know, what my mind does, when I am working on theoretical physics stuff. Mostly I just stare out of the window, at the screen or at the whiteboard, and something comes up from somewhere which cannot be described (Dungeon Dimensions?). Surface thinking is often just meaningless fluff; busywork so that you can convince yourself that you are working. It does seem more unpredictable than rigorous.

However, after one delirious evening of science making, I wrote down this remark:

The sensation of intellect is in communication with the environment/sense fields like any other sense experience. Similarly as a warrior chooses his next action based on the communication with the situation at hand; a scientist formulates new images, structures and perspectives in communication with the forms and movements of the universe.## Re: Nothing happening

Yes, this is very close to my experience! I was actually trying to write a post that was partly about this last weekend. Where by ‘post’ I currently mean ‘structureless braindump’ :(

I remember Hadamard’s

The Psychology of Invention In the Mathematical Fieldbeing good on this. Might have to give it another look.