Wednesday, December 29, 2010

Predicting, fitting, omitting

I've been meaning to blog about three tenuously related topics: (a) the role of predictions in science and popular culture, the difference between saying that "good theories should make predictions" [true-ish] and "the point of good theories is to make predictions" [untrue], and what this has to do with FiveThirtyEight; (b) Karl Popper as interpreted by Peter Godfrey-Smith, Wittgenstein and Dummett as interpreted by Timothy Gowers, and the extent to which "formalism" is a reasonable approach to the philosophy and/or practice of science; and (c) the many-worlds interpretation of quantum mechanics, why I've largely given up on it, and a moral that I think one ought to draw from the theory of phase transitions and collective phenomena. This post is a long, chaotic, and very incomplete stab at the first two; I'll come back to (c) anon.

1. The point of scientific theories is not to make predictions; it is to put together stories that explain data. Models with explanatory power usually make predictions. When they do not, it is for one of three reasons: (a) the explanatory power was illusory; (b) all the experiments were done a long time ago and there are several rules-of-thumb describing them that the model finally puts together, but it makes no new predictions; (c) experiments that would test the model are infeasible. (a) is what is usually meant by a theory lacking predictive power but really this almost always means that the theory needed an inordinate number of fit parameters to retrodict anything, i.e., there were a very large number of other theories with the same level of simplicity that would have been equally possible. (This is the case with religious accounts of anything.) As for (b) and (c) they are irrelevant to the goodness of the theory -- though (c) might be a symptom of people trying to shield their theories from experiment, it needn't be. If for political reasons it became impossible to build any further accelerators, that would not reflect on the theories that might have been tested in these putative accelerators.

There are some results approaching the limit of type (b) scattered throughout physics, though I can't think of any pure cases. In general, understanding what the shapes of various graphs have to do with one another tells you something more, at the very least it tells you what features of a given material make it behave a particular way, and suggests what other materials you should be looking at. Nevertheless, in my experience it is not true that models are considered less worthwhile as they approach the limit of type (b) -- though for sociological reasons a model of this kind is less likely to stimulate further activity. (A case in point is Wilson's theory of phase transitions, which was understood to have revolutionized physics although, as far as I know, it made no predictions that were verified before Wilson got his Nobel Prize.)

2. Apart from theories of type (b), one is struck by the differences in attitude between scientists and people who really are interested in making predictions. Andrew Gelman had a sociologically interesting post up in November, arguing that it was sensible for Silver to pour reams of probably trivial factors into his election-forecasting model on the assumption that they might help. (Matt Pasienski had made some very similar points in an IM conversation.) From a scientist's perspective what Silver does is a fairly absurd case of "overfitting" -- one always learns to avoid unnecessary fudge factors but Silver just sort of heaps them on -- but of course his "model" is meant to forecast elections and not to explain them. If elections could be explained this would all be rather silly but the existing models are less than perfect, so arguably it makes sense to hedge one's bets.

3. Which brings us to the question of why overfitting is a bad idea -- and I don't mean egregious overfitting like having as many parameters as data points, but just the vaguely disreputable tendency to introduce random vaguely relevant factors to make your model fit better. I can think of three basic reasons: (i) models with many parameters are hard to use and don't correspond to the kind of simple mental picture that is usually necessary for new creative work, (ii) they leave more stuff unexplained (why are the parameters what they are?), (iii) they are, like the epicycles, increasingly difficult to refute. [One might also mention (iv) they violate Occam's razor, but I don't think it applies here as "necessity" is ill-defined as after all one's curves do get a little closer to one's data.]

4. Which brings us to Popper as interpreted by Godfrey-Smith. I think the best way to understand the rule against over-fitting -- and the related preference for simplicity -- is in these terms:

In this section I will use a distinction between synchronic and diachronic perspectives on evidence. A synchronic theory would describe relations of support within a belief system at a time. A diachronic theory would describe changes over time. It seems reasonable to want to have both kinds of theory. [...] epistemology in the 20th century tended to suppose we could have both kinds of theory, but often with primacy given to the synchronic side. The more novel possibility, which I will discuss in this section, is the primacy of the diachronic side, once we leave the deductive domain. [...] A diachronic view of this kind would describe rational or justified change, or movement, in belief systems. [...]

In this section I suppose that we do not, at present, have the right framework for developing such a view. But we can trace a tradition of sketches, inklings, and glimpses of such a view in a minority tradition within late 19th and 20th century epistemology. The main figures I have in mind here are Peirce (1878), Reichenbach (1938), and Popper. This feature of Popper's view is visible especially in a context where he gets into apparent trouble. This is the question of the epistemic status of well-tested scientific theories that have survived many attempts to refute them. Philosophers usually want to say, in these cases, that the theory has not been proven, but it has been shown to have some other desirable epistemic property. The theory has been confirmed; it is well-supported; we would be justified in having a reasonably high degree of confidence in its truth.

In situations like this, Popper always seemed to be saying something inadequate. For Popper, we cannot regard the theory as confirmed or justified. It has survived testing to date, but it remains provisional. The right thing to do is test it further. So when Popper is asked a question about the present snapshot, about where we are now, he answers in terms of how we got to our present location and how we should move on from there in the future. The only thing Popper will say about the snapshot is that our present theoretical conjectures are not inconsistent with some accepted piece of data. That is saying something, but it is very weak. So in Popper we have a weak synchronic constraint, and a richer and more specific theory of movements. What we can say about our current conjecture is that it is embedded in a good process.

Occamism has been very hard to justify on epistemological grounds. Why should we think that the a simpler theory is more likely to be true? Once again there can be an appeal to pragmatic considerations, but again they seem very unhelpful with the epistemological questions.

From a diachronic point of view, simplicity preferences take on a quite different role. Simplicity does not give us reason to believe a theory is true, but a simplicity preference is part of a good rule of motion. Our rule is to start simple and expect to get pushed elsewhere. Suppose instead we began with a more complex theory. It is no less likely to be true than the simple one, but the process of being pushed from old to new views by incoming data is less straightforward. Simple theories are good places from which to initiate the dynamic process that is characteristic of theory development in science.

5. Which, finally, brings us to Gowers --
I would like to advance a rather cheeky thesis: that modern mathematicians are formalists, even if they profess otherwise, and that it is good that they are. [...] When mathematicians discuss unsolved problems, what they are doing is not so much trying to uncover the truth as trying to find proofs. Suppose somebody suggests an approach to an unsolved problem that involves proving an intermediate lemma. It is common to hear assessments such as, "Well, your lemma certainly looks true, but it is very similar to the following unsolved problem that is known to be hard," or, "What makes you think that the lemma isn't more or less equivalent to the whole problem?" The probable truth and apparent relevance of the lemma are basic minimal requirements, but what matters more is whether it forms part of a realistic-looking research strategy, and what that means is that one should be able to imagine, however dimly, an argument that involves it. 

This resonates with me because I've always had a strong formalist streak; it goes with the Godfrey-Smith quote because formalism in mathematics is a diachronic perspective -- it says, "mathematics is a set of rules for replacing certain strings of symbols with others" -- and I think a diachronic philosophy of physics would have some appealing resemblances to formalism. I was going to explain how I think a diachronic perspective and the effective field theory program might affect how one thinks about, say, the many-worlds interpretation of quantum mechanics, but this post is already far too long.


Grobstein said...

I do not really like the argument of 5. Here are some thoughts as to why:

1) The observation amounts to this: mathematicians communicate among themselves on the viability of proof concepts, independent of and sometimes to the exclusion of the truth of the target proposition. This shows I think that the enterprise of mathematics is largely to prove things, which is not the same thing as identifying true propositions. (Maybe better: the enterprise of mathematics at a given time is to prove things given the proofs available at that time.)

2) I don't want to therefore conclude that mathematicians "are" formalists. Platonists could a) define their goal as discovering true statements, b) argue that finding proofs is hopefully an efficient way of discovering true statements, c) explain the "formalist" talk as simply the practical implementation of a) and b). And of course when there is

3) So we might simply say that Gowers's point is "sociologically interesting" but not philosophically interesting.

I assume I am refusing to get the point. Despite taking that horrible class at Amherst I do not have a very confident sense of what these positions mean. I haven't read the longer Gowers.


Zed said...

Oh I agree; in fact I considered a disclaimer about that. The argument proves too much because any research usually involves _doing_ something rather than convincing yourself that something is true. But there is a difference in my experience between mathematicians and physicists re willingness to assume generally-believed-but-not-proven statements, which would suggest that mathematicians are "more formalist." (In general, people who believe that the conclusions of proofs are _truths_ should be less proof-fixated, as there are presumably other ways of glimpsing truths. But these might just be very uncertain...)

I think the answer to your (3) is in the intro to the Gowers essay where he remarks that he's a "naturalist," i.e., a believer 'that a proper philosophical account of mathematics should be grounded in the actual practice of mathematicians. In fact, I should confess that I am a fan of the later Wittgenstein, and I broadly agree with his statement that "the meaning of a word is its use in the language".' Which is probably begging the Q.

What I thought was valuable in (5) was the implied association between formalism and the diachronic attitude, which I think is useful. There is an intellectual kinship between a doctrine that says "the rule of motion in mathematics is to replace strings with other strings according to certain rules" and one that says "the rule of motion in physics is to take hypotheses and test them." On both of these you evade questions about your level of confidence in the actual _statements_ of the theory at any given time, which is probably a good thing.

Grobstein said...

Yes. To an outsider it looks like the difference you identify b/w mathematicians and physicists seems to correspond to the differences in available methodologies between the disciplines, which would then be covered under the notional disclaimer about practicalities. Physicists I should think have a wider range of ways to test the truth of a proposition and so less reason to rely on "formalisms."

I very much like these gestures towards diachronic accounts although I reserve some doubts as to what they mean.

Zed said...

It depends on whether/how far you buy the "use determines meaning" thesis. If one acknowledges that there are/have been no mathematical methods beyond proof, the notion of "truth" as separate from proof is redundant on this view. (And maybe on others.) In particular it would be impossible to learn what, independent of there being a proof, it might mean for something to be true.

I'm somewhat allergic to the notion of truth, so I am drawn to the notion that science consists of possibly meaningless (but workable!) rules for updating theories in response to experiments. The next step is to justify "why these rules rather than others," but this seems easy to do using the usual (Peirce-type) pragmatic arguments. This is not where PGS wants to go with the notion -- I think he wants to say that the scientific method is justifiable on Bayesian-convergence grounds -- but, like, fuck PGS.

Grobstein said...

I suppose if it is possible to describe the enterprise of mathematics without reference to truth, then the appeal of Platonism is rather diminished. I have always loosely felt that formalism (and any kind of deductivism) is unhelpful here, because the truth of the meta-level statements is no less mysterious than the "truth" of the object statements.

Zed said...

I don't understand your remark about the object vs. meta-language. On deductivism the only "truths" are that certain strings can be replaced by certain other strings by applying a set of rules. These truths seem less mysterious than, e.g., the continuum hypothesis.

Grobstein said...

Can the continuum hypothesis be stated as a proposition about which strings can replace which others? I should think so but have not worked it out. If so, anyway, that equivalence (for me) tends to mystify deduction, rather than de-mystifying the CH. I suppose I don't find it de-mystifying that (if) proofs and problems can be described in wholly mechanical terms. I've thought of them that way for as long as I can remember, but I don't feel like it makes me a formalist. (Again, perhaps I do not know what a formalist is.)

If the unattractive thing about Platonism is that you are positing the existence of / eternal truths about numbers, which are these ill-defined objects we have no direct access to, well. Positing eternal truths about proofs instead does not really help. Proofs can even be put in a 1-1 relationship with numbers, I think.

Zed said...

You can write down the continuum hypothesis as a statement in ZFC, or you can write "ZFC implies CH" as a statement in first-order logic. The latter is a "strings for strings" assertion. The first is true in some models of ZFC and false in others; the second is false. The "mystery" arises when you ask whether the CH is true or false in the "real model" of ZFC, i.e. the one that describes the universe of eternal truths. In Platonism there is a fact of the matter about CH, or the ax. of choice, but it is unclear how we are to access that fact; it is also unclear how, when we've convinced ourselves one way or the other, we are to persuade others of our point of view. (This is where Gowers's comments on the ax. of choice come in.)

I do not think it is true that proofs can be put in a 1-1 _structure-preserving_ correspondence with numbers, i.e., there is no natural isomorphism. Consider the simpler case of sentences and numbers. You could find the Godel numbers of two sentences, add them, and convert back to the sentence language, but this would not correspond to a logically valid (or meaningful) operation on the sentences. The reason this matters is that for you to show that proofs are "like" numbers it's not sufficient just to have the cardinalities of the sets match.

Formalism is (as I understand it) the notion that you shouldn't ask proof-system-independent questions about the truth of (infinitary) mathematical statements. What you gain by this move is that first-order logic (say) is much better behaved than mathematics; in particular, it has a completeness theorem, so the kinds of paradoxes that can arise in first-order logic are pretty limited.

windwheel said...

The scandal of diachronicity is that, if you accept Evolution, it is a co-ordination problem.
Since heuristics for co-ordination problems are heavily selected for- indeed a lot is hardwired in language acquisition- diachronicity's problematics are going to be discourse independent- i.e. you get the same effects and aporias in- say- lit crit as quantum physics.
Faced with the co-ordination problem of selecting which titty bar we should bunk off from the Seminar so as to get to hang out with the playahs without seeming uncool- my guess is that bifurcations in the choice tree have to have the property of killing off their virtually identical twin. This sounds like Rene Girard's theory of 'mimetic desire' and something like it appears in one form or another across European intellectuals of Popper's period.
In other words, a cultural artefact, specific to a certain deeply troubled historical milieu, is poisoning second order discourse.
A simpler way to go would be just restating the problem in terms of Evolutionary Game Theory.

Zed said...

It's important to separate out explanations of why our beliefs come about from justifications. (As we know many evolved-for heuristics are bad.) An evolutionary mechanism per se justifies nothing; what I find interesting about PGS's argument (or diachronicity + appeal to pragmatism) is the notion that you should look for justifications of belief-finding rules rather than of beliefs (cf. act vs. rule utilitarianism). (I agree with your other remark about grue/evolution but that is a different sort of Q. being based on a factual puzzle.)