Creating Information
by Jared* on April 8th, 2006From time to time I run into the claim that the processes of evolution cannot add new information to the genome. It’s time to talk about why this is not the case.
First let’s get some things straight: DNA is not simply a code. Yes it codes for proteins, but it has physical/chemical properties of its own. Some proteins recognize and can bind to specific sequences of DNA–a fact that is routinely used in laboratories to manipulate DNA by cutting it in specific places. Proteins are composed of amino acids, which each have physical/chemical properties. It is these properities that give a protein its overall shape and function. So altering the sequence of DNA can change how proteins interact with it, or if the change is in a stretch that codes for a protein, it can change the sequence of amino acids which in turn can alter the properties of the protein.
Sometimes letters composing words and sentences are used to illustrate changes in DNA or amino acid sequence. It is a useful analogy but it does have limitations. Words have meaning because we assign the meaning to them whereas proteins act on other proteins and molecules as a result of physical and chemical properties. Their “meaning” is dependent on context.
There are several ways that new genes can be created.
1. Gene Duplication: There are a couple of ways that genes can be duplicated–for our purposes we’ll just say that copying errors or recombination can each lead to duplication of genes. With duplicate genes there are several potential outcomes. Two simple scenarios involve the duplicate copy either mutating into useless sequence, or mutating and taking on a different function–either way the original copy is able to carry out the original function. There are numerous families of genes in our genome–families of genes that are a result of gene duplication and modification. Examples include the globin genes (including myoglobin and hemoglobin) as well as Hox genes which control developmental processes.
2. Exon Shuffling: In eukaryotes (like us) many genes are broken up into units called exons, while the sequence between the exons are called introns. Introns are spliced out before translation into protein occurs. Through various mechanisms, an exon of one gene can be tacked on to (or into) another gene leading to a novel chimeric gene.
3. Mobile Elements: These can be thought of as stretches of selfish DNA that copy and paste themselves throughout the genome. If they land in the middle of a gene they can alter the way the sequence is spliced, thus altering the amino acid sequence of the protein. Alternatively, if they land near a gene they may alter how, when, or where the gene is turned on or off (see amylase below).
4. Lateral Gene Transfer: This is not as common in us, but genes can be moved from one organism to another (by a virus, for example).
5. Sequence Coding: Also less common, sometimes sequence in or adjacent to a gene can change such that the gene is extended by a number of amino acids.
6. Gene Fusion/Fission: Two genes can be fused into one, or one gene split into two.
In addition to the creation of new genes, the regulators of gene expression can be altered. For example, in a previous post I dealt with the amylase enzyme and how it came to be expressed in the mouth (in addition to the pancreas). In that case, a gene duplication occured followed by the insertion of “junk” DNA in front of one of the genes. That “junk” DNA acts like a switch and is responsible for the expression of amylase in the mouth. It is a case of an apparently accidental occurance that happens to be useful.
Thus we see that new information can be generated in the genome. Whether it has all happened strictly by natural processes is another question–and one that is probably ultimately unanswerable by scientific means. Nevertheless, we should not underestimate the creative processes that already exist within the genome.
Nature Reviews Genetics 4, 865-875 (2003); The Origin of New Genes: Glimpses from the Young and Old. (Unfortunately this article is not freely available.)For additional information you can search the books available here.
Jared, I have nothing to add except my compliments for a well-written review!
Thanks, BrianJ!
Yes, genetic processes are very interesting and can certainly lead to all sorts of interesting side effects. But a proper discussion of “creating information” should be much more careful about the sense used of both “create” and “information”, because any successful argument in this field hangs critically on semantic precision regarding both.
In particular, there are a variety of related, but incompatible information metrics. Statistical thermodynamics alone has two of them - “information” measures the amount of knowledge one has about the detailed state of the system. “entropy” on the other hand measures the additional amount of information an omniscient observer would have about the state of the system. Or in other words, thermodynamic information is a measure of knowledge and thermodynamic entropy is a measure of ignorance. And yet they are both measured in the same units - ‘bits’ or something that reduces to bits.
When we analyze a genetic process and discover new sequences are being ‘created’, thermodynamically (or according to the very similar metrics of conventional information theory) speaking we must consider are these sequences new “information” or are they new “entropy” or does the process conserve both?
Unfortunately, the second law of thermodynamics doesn’t allow for new information at all, because information and entropy are reversely correlated and entropy always stays the same or increases (by natural processes at any rate). Indeed if the second law of thermodynamics is true, any increase in information must be strictly be considered either supernatural or impossible.
Resorting to an outside energy source will not do either - if the system is the whole universe, there is no external energy source and available information about the system either remains the same or decreases.
Therefore no natural process can produce net information - not in the way either thermodynamics or conventional information theory defines it at any rate. Randomness doesn’t count - any truly random process produces entropy, not information.
But one might protest - the process of natural selection amplifies an infinitesmally small percentage of random mutations into large scale biological structure. However the same information theoretical arguments apply - the second law says no net macroscopically available information can be produced in a closed system. So where did this biological success story gets its information from?
According to a coherent theory of natural selection, the information comes not from random processes per se, but rather is extracted from the environment. Or in other words effective, self-sustaining, ordered biological complexity of the type that we can measure and study can only be derived from pre-existing macroscopic complexity of the surrounding environment.
Any other sort of evolution is forbidden by the statistical form of the second law of thermodynamics. The idea that you can ever get macroscopically available, ordered and biologically useful complexity from an initial state that consists of a random collection of atoms is thermodynamically and statistically untenable.
I should mention that macroscopic in a thermodynamic sense simply means large or predominant enough that one can measure its structure without destroying it or the effect it has on the state of the system. One might easily analyze one molecular re-configuration - i.e. moving atoms around with an electron microscope, but one cannot track all of them, let alone their entropy bearing nuclear and electronic vibrations…
At least not without perfect knowledge regarding the initial state of the system, of course…
Mark (#3), I appreciate your concern for “semantic precision.” I am unfamiliar with the manner in which you are using some terms, so I’m hoping you can help me out a bit.
I am not familiar with mathematically precise definitions of ‘information.’ Your definition of it as “the amount of knowledge one has about the detailed state of the system” seems strange to me because it sounds like it depends on the existence of a conscious person. Can you put this definition into operational and mathematical terms?
Your definition of ‘entropy’ sounds similarly strange to me, with its reference to “an omniscient observer.” The definitions of entropy I’m acquainted with are Eq. (1) here and the first equation here. Are you using one of these definitions, or some other mathematical formula you can help me understand in operational terms? And then could you help me understand the mathematical relationship between ‘information’ and ‘entropy’?
I would understand if what I am asking for would take more than a blog comment; if so I would be interested in being pointed to a source that could answer my questions.
Mark,
I will readily conceed that if this discussion gets mathematically technical, I will fall flat. I have not studied information theory and I’m limited in my math skills. But from what I have read, I can’t quite buy what you are saying–at least not what I think you are saying.
Good Math, Bad Math did several recent posts on information theory. (Here, here, and here.) It doesn’t seem to match what you are saying.
I have not argued for abiogenesis here, and while I would not rule it out as a possibility, that seems to be what you are attacking rather than, say, fish giving rise to tetrapods.
Christian (#6), yes - information, entropy, probability, etc. in general are observer dependent metrics - the observer doesn’t have to be “conscious” - the constraints apply equally to a computer or similar machine. Statistical methods in general are not required in the presence of perfect information - no unknowns. In statistical mechanics, information is the measure of what the observer / experimenter knows about the state of the system, and entropy is a measure of what is not known. Similar definitions apply to communication between a sender and a receiver over a noisy channel - a problem which was the motivation for much of modern information theory.
This of course begs the question of why should we care about something that has an epistemological concept? The answer is that it has been shown to be an extremely powerful tool for distinguishing the overwhelmingly likely from the infinitismally unlikely, in such a regular fashion that no one has ever been able to demonstrate a macroscopic exception.
There is little or nothing that prevents physical processes from running in reverse. So if I open a barrier to let two disparate gases mix, how come they never spontaneously unmix? The answer is it is not impossible, but rather exceedingly unlikely. And unlikely here means that of all the different possible configurations of gas molecules, configurations that lead to the two gases spontaneously un-mixing are infinitismally rare, even though they are typically sign flipped versions of configurations (system states) of gases mixing in the forward direction.
From an information theory perspective, we started out with some information - e.g. the molecules of gas A are on the left, and gas B on the right., and large amount of entropy - the unknown trajectories and vibrations of the gas molecules. After we open the barrier, the gases have mixed, we now have less information, and entropy (the amount of knowledge we lack about the detail stated of the system) has gone up. The simply way to think about this is that uncertainty (statistical “fuzz”) spreads and increases with time.
Suppose you have complete control over the initial state of the universe - if you just throw out a random collection of atoms or particles and watch what happens the likelihood that anything particularly interesting happens is essentially non-existent. A random configuration is statisically indistinguishable from an unknown one. Any order and structure that appears has to be derived from the another source - which in the case of a simple physical system is the laws of physics, symmetries, and boundary conditions.
For example, atoms have a predictable structure, that derives from spherical symmetry (rotation invariance), the properties of fundamental particles, and Schroedingers equation. If you have a different physical setup, you can get different ‘pretty pictures’ (e.g. snowflakes) but you never get information out of such a system that you did not put in. Fractals, for example, have no more true information content than the equation used to generate them. (c.f. Kolmogorov complexity).
Information theoretic constraints are most easily applied to the conditions of abiogenesis (which is ridiculously unlikely on the order of the probability of a person spontaneously rematerializing on the other side of a brick wall), but the analysis is generally applicable if proper care is taken. There is plenty of information (ordered available macroscopic structure) in the environment to get reflected in all sorts of reasonable morphological changes, but whether the environment has the necessary information content to get reflected in the development of biological subsystems without any known or conceivable antecedents in the organism itself is fairly open to question, and in particular rigorous statistical analysis.
[A bigger entry box would sure be nice…]
Just to review for spelling errors, not to make even more long winded comments of course.
Just to add to Mark’s excellent comments, I think Christian that if you think back to your statistical mechanics you’ll directly see the relationship between knowledge and information. The other thing to keep in mind (sorry, I can’t recall how far you went in grad school, so forgive me if this mistakenly comes across wrong) in Noether’s theorem. Indeed developing all of thermodynamics in terms of symmetries is a very useful process.
The one, perhaps small caveat, to Mark’s point is that I’m not entirely convinced that universally - either spatially or temporally - we ought assume the laws of thermodynamics hold. I don’t want to make this as an argument based upon a naive appeal to Noether’s theorem applied to cosmic background radiation, as some have done in the past. (i.e. that since we don’t have temporal symmetry due to the nature of the “freezing” of the universe from the big bang, we can’t have energy conservation and thereby perhaps issues in entropy) Things are much more complex than that in cosmology. I do think, however, that the exact boundaries of cosmological thermodynamics are perhaps more open, or at least less understood than some assume. (i.e. clearly we can’t just take the universe in terms of general relativity and thermodynamics - although heaven knows that in itself is an interesting topic)
Ask and ye shall receive. At least sometimes: the comment box is now larger.
Mark, I am not finding you persuasive because you are using terms (e.g. “entropy”) that have standard mathematical and physical definitions, but your discussion seems to be carried on in a way that is both qualitative and idiosyncratic. For instance the posts Jared referred to, which seem consonant with what little I know of the standard lore of information theory, describe ‘(information) entropy’ and ‘information’ as the same thing, not two sort of inversely related things as your descrption of free expansion of a gas suggests. (Moreover, I don’t get the necessity of observers thing at all. The reference to observers, which appears to be behind this perceived inverse relationship, makes your description seem flaky in addition to being wrong. I must say that your statement “In statistical mechanics, information is the measure of what the observer / experimenter knows about the state of the system, and entropy is a measure of what is not known” is simply not resonating with anything I learned in three or so semesters of study of the subject.)
With the standard physics definition of entropy, I know how to compute the entropy change associated with the free expansion of an ideal gas (and it has nothing to do with observers). What I don’t know is how you are defining the putative ‘information’ in this gas, and how to compute the supposed information change in its free expansion.
Christian, the place of observers makes sense when you think of the appeal to information in a string. Consider what makes a string meaningful so we can talk about redundancy in the string. It only makes sense if we have an observer who gives it meaning. Even traditional senses of entropy in traditional computer compression (say the zip algorithm which many of us are familiar with) sees meaning in terms (ultimately) of ascii characters and what it takes to represent them.
As for entropy and information, you probably should read Shannon’s seminal introduction to information theory. It’s very easy to read and is a classic.
You might wish to check out this discussion of Shannon I found googling. It does a nice job of relating information entropy and thermodynamic entropy. It also does a nice job explaining the role of an observer and understanding.
Clark, Mark seems to know a lot of the lingo and to have had some exposure to physics, but there seems to be considerable conceptual muddling. And no, thinking back on statistical mechanics courses is not helping me understand him.
I am familiar with Noether’s theorem, but I don’t see what it has to do with the discussion here.
Clark, after reading the link you provided I stand by what I’ve said. Money quote: “More information means more [Shannon] entropy,” directly opposite of what Mark said. Clearly he’s read plenty, but his understanding of it seems FUBAR.
Thomas Schneider is a molecular biologist that uses information theory. His lab page is here.
I’m still trying to get my mind around some of these things, but while digging I found that he is no fan of creationists–Dembski and Behe in particular.
Christian, I guess I’m fuzzy on what it is you’re objecting to. Perhaps the confusion is over information for the observer or information in the system?
Clark, that may be part of the confusion, but indeed I don’t understand why “information for the observer” is brought into the discussion, since standard definitions of entropy and information (the same thing, dammit!) refer, in your phrase, to “information in the system” and do not refer to observers at all.
I’m also objecting to what seems to be a claim by Mark that there is an inverse relationship between ‘information’ and ‘entropy,’ and that he has not provided mathematical definitions of these that make contact with accepted definitions.
Christian, have you ever taken a class that dealt with statistical thermodynamics? My second quarter of thermodynamics at the U. dealt almost exclusively with it.
I made very clear in my first comment in this thread that there are several related information metrics around, and that one could not blindly substitute one for the other. Computer scientists, for example, normally measure information in terms of lossless compressibility. That is not equivalent to the thermodynamic definition of information or entropy.
Thermodynamic information and entropy are reversely correlated. If you doubt this fact I refer you to the definition of information in any decent thermodynamics text. Zemansky and Dittmann, Heat and Thermodynamics, Sixth Edition, define thermodynamic information in section 11-9.
Not only that - thermodynamic entropy and information have been shown to be statistical measures with a subjective component - subjective in the same sense that Bayesian probabilities are subjective.
Entropy might look conveniently absolute from the macroscopic perspective of a typical chemist, but on a microscopic scale what is “entropy” and what is “information” (in the statistical thermodynamics sense) is distinguished by what is known to the observer.
Zemansky and Dittman quote Leon Brillouin, a scientist who did much to establish the field of statistical thermodynamics in the 1940s and 50s, as follows:
“Entropy measures the lack of information about the exact state of the system”
Now if you want to make an informed objection to my rough outline of an argument, please do so. But do not state I do not know what I am talking about without doing some cursory fact checking first.
Here is a nice introductory article on the subject:
Entropy and the Laws of Thermodynamics
Principia Cybernetica Web
http://pespmc1.vub.ac.be/ENTRTHER.html
I might also mention that information is also reversely correlated with entropy in standard information theory, i.e. H = -I.
Before jumping to conclusions, might one consider the possibility that one is dealing with something akin to a change of reference frame? Entropy does represent information, just unknown information, rather than known information.
Of course the subjective nature of entropy is not well known - it was not understood until about fifty years ago. Normally thermodynamics is only used on macroscopic systems. However as the scale of the system studied gets smaller the distinction between information and entropy becomes increasingly critical.
It is just easy to mix them up because they are measured in the same units, and the same stochastic pattern can occur either as entropy or as information dependent not only on observer per se, but on how a system is modelled - similar to a gauge transformation in physics, except more consequential.
Another trivial (but very rough) analogy is the frame dependence of electric and magnetic fields. Electric fields transform into magnetic fields, and magnetic fields into electric fields, just by changing to a moving reference frame. Entropy changes into information, and vice versa depending on the level of detail one tracks on each side of the register. “Free energy” is energy about which one has detailed information, “Heat”, on the other hand, is energy about which one does not have detailed information (or about which one chooses not to model in detail). Free energy and information are correlated, as are heat and entropy.
“…abiogenesis (which is ridiculously unlikely on the order of the probability of a person spontaneously rematerializing on the other side of a brick wall)”
Could you offer such a calculation please? One that, presumably, models all possible chemical routes from abiotic to biotic chemistry?
Anon (#21), statistical mechanics is a sufficiently general technique that explicit modeling of chemical routes is unnecessary to predict rates of chemical formation. The basic assumption is that all energetically achievable molecules are created through random thermochemical reactions - i.e. that all energetically possible chemical reactions occur in the system.
While one may legimately dispute the proper order of magnitude - I should more accurately say something like “on the order of the order of magnitude of…” it is well established that spontaneous rate of formation of relevant protein chains even in veritable amino acid soup under perfect conditions is extremely low. And the larger the molecule the shorter the mean lifetime, under thermally enabled conditions of course. The conditions conducive to random formation of large molecules are not conducive to them staying around for very long. Of course a single protein chain does not a successfully self sustaining biological structure make.
The statistical calculations seem very limited in their understanding of biology and chemistry. First, they limit themsleves to the billion years between when the earth was formed and when life appeared. This ignores the possibility of billions of earths all having a billion years to make life, and ours was the lucky one that “succeeded.”
But the second thing they ignore I think is more important: they are testing the possibilities of models that are most likely incorrect. For example, many people have modeled the improbability of a random protein that is capable of replicating itself. But most researchers in this field don’t think it was a protein that came first, but rather it was some kind of RNA. Importantly, there are some RNAs around today that replicate themselves, so it is clearly possible.
I was at a conference a few weeks ago where Leslie Orgel (the author of Orgel’s rules) spoke. He threw some more confusion into the “probability pot” by pointing to recent evidence that suggests that perhaps it wasn’t even RNA that came first, but rather a molecule similar to RNA such as TNA or GNA. What makes these so special? TNA is based on a 4-carbon sugar backbone, and 4-carbon sugars are much more easil made spontaneously than the 5-carbon sugar of RNA. GNA uses a glycerol backbone, and glycerol is as abundant as–well, as arguments against abiogenesis. (And I should point out that all of the other necessary building-block molecules to make proteins and RNA have already been shown to occur spontaneously–it was the sugar that was confounding scientists.)
Oh, and another idea being kicked around by scientists in this field is that life could have arrived by meteorite. Then the question is, “How did life begin on the planet from whence the meteorite came?” And the answer, of course, could be, “From some other meteorite.”
I actually like this argument, not from a scientific standpoint but from a theological one. It reminds of the common question, “But who made God?”
“statistical mechanics is a sufficiently general technique that explicit modeling of chemical routes is unnecessary to predict rates of chemical formation.”
Then can you point me to a published paper where such a calculation was done that justifies saying abiogenesis is “ridiculously unlikely on the order of the probability of a person spontaneously rematerializing on the other side of a brick wall?” If not, can you point me to your calculation? The odds of spontaneous formation of chemical structures in various conditions varies wildly, and all such calculations I’ve ever seen employ a some form of “tornado in a junkyard” reasoning (occasionally dressed up in intimidating math), which is why I specifically mentioned modeling all possible routes. That a wealth of peptides chains can form from available conditions is well understood. (e.g. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=10477135&query_hl=6&itool=pubmed_docsum). I’m just attempting to figure out how you’ve arrived at your claim so it can be evaluated. I’m not aware of anyone having established that we have the ability to say abiogenesis is so unlikely in all possible conditions, much less those that might properly model places on early Earth.
Brian J, you misunderstand. Statistical mechanics is fundamentally a *mathematical* technique generalizable to any physical system. Its validity in a thermodynamic context is contingent only on the accuracy of extremely well established physical laws. There is no scientific opposition to statistical mechanics - it is far too well founded for that.
Areas of disagreement lie in the formulation of specific problems - in particular disagreements about initial conditions, as well as the applicability of secondary models derived from more fundamental considerations.
I am not saying these calculations are perfect, I am just saying that even rough mental estimates based on these well established statistical models indicate that abiogenesis is overwhelmingly unlikely. Or in other words, given the state of the evidence it is hard to conclude as an article of faith that abiogenesis is the only possibility or even the most likely possibility, no matter how much intellectual fulfilment it gives to atheists.
“Mainstream” biologists have yet to engage these arguments on a serious basis. I see criticism after criticism all the time that either amount to sophistry or contain fundamental errors in reasoning - like equivocating between entropy and information, or leaving out major terms in probability calculations, or worse, arguing as if mere possibility of a process is an end to all argument.
Now the serious ID arguments may well have flaws, but the only way to know for sure is to engage them seriously. The impression one gets is that biologists would prefer never to discuss, research, or analyze the issue in rigorous statistical mechanical / thermodynamic terms at all. That attitude makes key aspects of evolutionary biology look like uninformed speculation of the type that runs rampant in the social sciences.
Anon, I have already admitted my estimate was a rough mental calculation of the order of the order of magnitude of something like that happen. Since biologists do not have any detailed theories for certain steps in abiogenesis (like polymers to protocells) all I can do is assume that such steps must also occur by random chance. However, given a docmented model just that component of the probability calculation could easily vary the result by ten orders of magnitude or more.
The irony is that the ID folks are trying to disprove theories for which there is no detailed evidence. Since the biologists are basically incapable so far of coming up with any sort of non-handwaving theory of how this happened, the ID people have to use much more general methods to show improbability of any such process.
The statistical methods for how to do that for thermochemistry up to the polymer stage are well established. Beyond that the science is still very fuzzy on both sides. However, the ID side has at least as reasonable hope of developing the proper statistical mechanical methods to analyze the problem in general, as the biologists have of documenting, simulating, or duplicating an actual abiogenetic process.
Mark: I am sorry that I was not clear. My fault. I trust statistics and mathematical modeling, but I also recognize their limitation. I think you pointed out that limitation very well: “Areas of disagreement lie in the formulation of specific problems…initial conditions….” It is very likely that the modelers are using the wrong initial conditions, given that the biologists and chemists are still trying to figure them out.
And that is why I would dispute something else you said: “…mental estimates based on these well established statistical models…” Why use well established statistical models on a biochemistry that is b>not well established?
“it is hard to conclude as an article of faith that abiogenesis is the only possibility” I don’t recall Dr Orgel, or any of the other scientists at the conference, making this conclusion at all. Instead, they are testing the abiogenesis hypothesis to determine whether it is correct or not. Remember, Orgel actually suggested that a meteor might have brought life to earth–if that is indeed true, then abiogenesis for life on earth is clearly wrong.
The focus of the work among the scientists is to 1) Determine what molecules would be necessary for abiogenesis to occur (the biologists take this), and 2) Determine what would be necessary for those molecules to form spontaneously (the job of the chemists). Once they have answers to those questions, then they try to determine the probability of such a thing occuring. I think that is a more rigorous approach than anything I have seen from ID, which wants to jump immediately to the conclusions without the evidence.
“Now the serious ID arguments may well have flaws, but the only way to know for sure is to engage them seriously.” There is already so much literature available that engages ID point by point. I can’t begin to list it.
Jared, I just realized that my last post is way off topic. Sorry about that. Your initial post addressed the common argument that new genetic information cannot be created. My first post was relevant; my last was a thread-jack. Sorry again.
I think that the detour is just fine. I should have stated from the beginning that I was not dealing with information theory. Rather I was taking on the intuitive sense of information.
The discussion has been enlightening–by all means, continue.
“The irony is that the ID folks are trying to disprove theories for which there is no detailed evidence.”
IDists typically - always as best as I can tell - are trying to show abiogenesis as a natural process is either extremely unlikely or impossible. That’s because they want to take that conclusion to draw an inference to design. The idea is to say it is not “naturally” possible, ergo the designer did it. They are not attacking specific models of abiogensis. They are attacking the notion altogether. Typically this is done by a general declaration like yours, followed by showing that a single abiogenesis event is unlikely given basic equiprobable chance assumptions.
“all I can do is assume that such steps must also occur by random chance”
Why on earth would you assume such a thing? As I’m sure you understand, the likelihood of chemical reactions is highly dependent on local conditions. All such a calculation would tell you is something about the likelihood of a process few if anyone takes seriously. Such a calculation does not justify any general claims about the likelihood of abiogenesis, like those you made. For all we know, abiogenesis might be near inevitable given existence of certain conditions found throughout the universe that are conducive to some series of spontaneous reactions. We don’t know.
BrianJ, I have failed to clarify an important distinction. Some of the results of statistical mechanics apply universally, and others are approximations. Thermodynamics as a theory came with caveats (e.g. near-equilibrium conditions) for a long time, and similar constraints exist on a wide variety of secondary models.
For the ID program to be scientifically succesful, it cannot be a mere matter of debate over the conditions that prevailed at certain times in certain places. Instead proponents must successfully document statistical mechanical constraints comparable in rigor to Noether’s theorem or Louiville’s theorem - note the word _theorem_.
I agree there is an enormous “literature” engaging ID point by point. However, little or none of it engages ID seriously - most of it is little more than rhetoric and cheap tricks mixed with arguments riddled with errors and omissions. A serious engagement requires close scientific analysis of the type that prevails in physics, not just fuzzy heuristic arguments and overweening triumphalism.
A theory that amounts to no more than it might have or could have happened in such and such a way is pure speculation in the absence of supporting evidence. By that standard, abiogenesis is a field still largely in the pre-scientific stage.
Anon (#31), The reason is that I see no scientific basis for assuming otherwise. Probability is a science of rational expectations and what one expects in the absence of evidence one way or the other tends to have a subjective component. Bayesian prior probabilities and all that.
More generally speaking, the basis of the ID dispute is that ID proponents do not see any scientific basis for concluding a priori that agent causation or libertarian free will (LFW) do not exist, where most scientists tend to do the opposite. Scientists tend to assume that there is no way agent causation or LFW can be demonstrated scientifically, where ID proponents are engaged in an effort to do just that - at least on a rational expectations basis - and this effort to challenge one of science’s most cherished assumptions drives many scientists up the wall.
I don’t want to get into an ID food fight, but I just can’t agree with the picture of ID proponents as the poor revolutionaries being oppressed by reigning scientists.
most of it is little more than rhetoric and cheap tricks mixed with arguments riddled with errors and omissions. A serious engagement requires close scientific analysis of the type that prevails in physics, not just fuzzy heuristic arguments and overweening triumphalism.
You’re telling me that’s not Behe? It’s not just (or even necessarily) that they challenge certain assumptions that drives people crazy–there’s more to it than that.
“The reason is that I see no scientific basis for assuming otherwise.”
That’s a poor justification. Simply because we don’t assume otherwise doesn’t mean we have good reason to assume a random combinatorial calculation (typically called “tornado in a junkyard” thanks to Hoyle) properly models all the routes to abiogenesis in order for us to make general pronouncements about its probablity. All we can say is abiogenesis is unlikely, given those assumptions. The only thing is no one in origins of life research takes that seriously as a model. Vaguely talking about prior probablity doesn’t change that. It’s as unjustified and assumption as would be making that assumption when calculating the odds of the solar system forming the way it does prior a theory of gravity.
“Scientists tend to assume that there is no way agent causation or LFW can be demonstrated scientifically, where ID proponents are engaged in an effort to do just that - at least on a rational expectations basis - and this effort to challenge one of science’s most cherished assumptions drives many scientists up the wall.”
I don’t think I’ve ever read a single major critical paper of ID that suggested its success or failure is entirely a matter of whether we accept if a priori agent causation or LFW are philosophically coherant and exist. This came out of the blue in relation to my comments. The criticims of ID are more mundanely about its numerous flaws in matters of representing the state of scientific knowledge and methods of inference.
e.g.
http://www.pandasthumb.org/archives/2004/08/meyers_hopeless_1.html
philosophy.wisc.edu/sober/ID&PRword.PDF
Many of the teleological arguments of IDists don’t even entail anything close to disembodied minds acausally deciding to cause things to poof into existence. *cough* God *cough* You’re treading awfully close to some form of conspiratorial thinking where you explain the failure of ID to persuade the scientific community by accusing scientists of hating the philosophical conclusions it entails. The cute thing is this directly contradicts the “it could be aliens” arguments of Behe et al. Guess not, eh?
For the benefit of the curious, here are a few standard measures of information. Hopefully these definitions will help clarify what I said earlier:
Category I: Loss-less information metrics for discrete systems
A. Raw information capacity - A context free measure of the information capacity of a discrete system constructed as the logarithm of the number of possible system states (omega). When the logarithm is base 2, the result is measured in bits. e.g. a system with N independent variables each with M possible values has omega = N*M and has a raw (context free) information capacity of log2(N*M) bits.
B. Discrete system information content - A measure of the information contained in a discrete system. Maximally equivalent to the discrete system information capacity, but normally reduced by performing a loss free algorithmic transformation from the original discrete system representation to an equivalent discrete representation. The actual value in bits depends on the chosen algorithm.
C. Kolmogorov complexity - the length of the smallest bit string that can be used to faithfully represent a discrete system, e.g. using an ideal compression algorithm.
Category II: Communications metrics
D. Raw channel capacity - A measure of the maximal information transfer capacity of a communication channel. Normally measured in bits/sec.
G. Effective channel capacity - measures the information transfer capacity of a band limited communication channel with a specified signal to noise ratio. Channel capacity is linearly proportional to both the logarithm of the signal to noise ratio (~the number of resolvable bits per sample when a base 2 logarithm is used) and to the bandwidth of the channel. (c.f. Sampling Theorem).
E. Signal information content - A measure of what the sender actually _signed_. Coherent only in the context of scheme for distinguishing between noise and data.
F. Signal noise content - A measure of noise and interference present in a signal. Net signal content = signal information content + signal noise content.
Category III: Lossy information metrics
H. Reduced information content - a measure of the raw information contained in an approximate representation of a system or signal. Approximation most commonly done by sampling or discretizing in either time, space, or frequency domains.
Category IV: Thermo-statistical information metrics
I. Maximal information capacity. A measure of the absolute (context free) information capacity of a system constructed as the logarithm of the number of possible system configurations. When logarithm is base 2, measured in bits. Please note that quantum mechanical constraints are the only reason real world systems do not have infinite information capacities.
J. Effective information capacity - A measure of the practical information storage capacity of a system at a given temperature, where the capacity is reduced from the raw information capacity by the thermal noise floor.
K. Uncertainty - A measure of the degree of ignorance an observer has (or model reflects) regarding the exact state of a system, constructed as the logarithm of the number of states possible after adjusting for everything that is known about the state of the system.
L. Information - A measure of the knowledge that a particular observer has (or model contains) regarding the state of a system, typically constructed as the negative integral of the change in uncertainty with regard to a system or component thereof.
M. Absolute information - Measure of information necessary to completely represent the state of a system. Metrically equivalent to the sum of information and uncertainty regarding the exact state of a system.
N. Entropy - Uncertainty normalized according to an adopted convention as to the boundary between what is represented as free energy and what is respresented as thermal energy, or macroscopic information and entropy, respectively. The convention adopted in a particular context is generally based on of what level of information is available about the detailed state of the system. Equal to k log(omega) in conventional units (J/(mol*K)), or log2(omega) in bit units, where omega is the number of possible system states after adjusting for the macroscopic information content of the system.
O. Macroscopic information - Measure of the macroscopic information content of a system, normalized the same way as entropy. Macroscopic information at time t is equal to the initial macroscopic information plus the negative integral of the change in entropy. Shared convention governs the precise definition of ‘macroscopic’.
The parallels between these categories should be readily apparent.
Anon / BrianJ, we are getting way off topic. I think debating the relative quality of the argument on both sides is largely a waste of time. There is clearly high quality and low quality stuff on both sides of the issue. I have outlined my perception. How am I supposed to defend that? Nothing you or I could say is likely to change our perceptions on such a subjective and experience driven matter to any significant degree.
The problem with debating the additional issues you raise is that probability is again a science of rational expectations. I did not base my personal estimate on a tornado in a junkyard model, because I think that is ridiculous. But I have no other means to estimate the polymer to protocell formation rates or probabilities because there are no mathematical models. So I, like anyone else, have to estimate based on reasonable expectations of what order of magnitude of rates / probabilities a rigorous statistical model would generate.
My best guess is that such models will generate formation rates hundreds of orders of magnitude _larger_ than the “tornado in a junkyard” model but yet still tens of orders of magnitude lower than what would be required for a reasonable expectation of abiogenesis.
But that is strictly an opinion. We will have to wait for a general purpose statistical mechanical model to be developed.
As far as the agent causation / LFW issue is concerned, the fact that these issues haven’t been addressed by biologists in journal papers is further evidence that few if any take ID seriously. If they did, they would have to address the metaphysical issues - as the long term (i.e. positive) argument for ID critically depends on them.
A coherent theory of agent causation is much more subtle than “poof” - the more general field is known as process metaphysics. Alfred North Whitehead, one of the leading mathematicians of the 20th century, did a lot of work in this area. Few such models hypothesize (let alone require) the existence of a supreme being. The existence of God per se is irrelevant to ID, process metaphysics, or agent causation. Theology is not subject to scientific analysis - however process metaphysics is. A successful process metaphysics must be compatible with established physical theories, i.e. duplicate the statistical results of quantum mechanics to experimental precision. And of course to move from the realm of metaphysics to a proper physical theory, it must prove superior.
I would like to return to the very interesting issue of the different senses of information and entropy if anyone is interested. It is certainly more germane to the topic of the post than what we have been talking about.
By the way Anon, it would be nice if your comments contained a lot more arguments and a lot less rhetoric. Express your perceptions on general matters all you want, I certainly do. But certainly ad hominem characterizations are out of line. I don’t know why you are posting anonymously, but your style certainly isn’t winning any points with me.
One last thing - I do not believe the argument I made in #3 and #8 has any general flaws. Or in other words, it should be susceptible to formalization. If anyone can point out anything obviously wrong with the logic (not the numbers) please do so. Some sort of background in statistical thermodynamics or information theory would obviously help. The definitions I listed in #36 should clarify several terms - I believe they are the same as (or closely compatible with) the ones in standard usage.
Actually one more - Christian if you are still paying attention, I think the definitions I listed in #36 directly address some of the questions you previously asked.
In particular the reason why entropy is not normally seen to be observer or model dependent the scale difference between what are treated as macroscopic features and and what are treated as microscopic (i.e. thermal) features of the system is typically very large. So there is very little ambiguity between the two categories.
The potential for ambiguity gets very serious as the scope of what one models explicitly (e.g. this atom is moving in this direction with this speed) and what models thermally (e.g. this nuclues has random vibrations characteristic of its thermal environment) gets closer together. That is where the subjective component of statistical thermodynamics comes in.
The extreme case is an “omniscient” observer with infinite computational capacity. Such an observer has no need for thermo-statistical calculations at all (at least if the system is deterministic), just plug the initial conditions in calculate explicitly, tracing each trajectory or wave function from start to finish.
Heat, temperature, and entropy are all macroscopic statistical concepts. They are very tenuous concepts when one is looking at the level of individual molecules, nor are they required components of a complete physical theory at all. Sufficiently detailed information replaces the need for such concepts completely.
As a practical example, consider the concept of the “temperature” of a hydrogen atom moving at velocity V. Neglecting internal vibrations, does the concept make any sense? The velocity might have been derived through a “thermal” collision with another atom. On the other hand the velocity might largely be due to the motion of a containing flask not far from absolute zero. The difference is inconsequential to the hydrogen atom.
Brillouin et al figured this all out, but most thermodynamicists tend to operate in a zone where the issue is not that significant. However, the difference is absolutely critical when dealing with the more general issues of thermodynamic evolution, time reversal, etc.
For example, physicists are generally at a loss when trying to explain the time asymmetry of the second law of thermodynamics. The fundamental physical theories are generally all time symmetric. So how can something so overwhelmingly pervasive and time asymmetric as the second law of thermodynamics be derived from more fundamental time symmetric equations?
From a statistical mechanical perspective, part of the answer may be simple - the time asymmetry of the second law of thermodynamics is more a reflection of the approximate nature of the models used and the fact that primary observers perceive the system only in a time forward direction. That fact combined with the fact that approximations (in general) always diverge with time explains the second law of thermodynamics quite nicely, at least for systems up to say laboratory scale (humans excluded).
One might object - but surely the perception of the observer is governed by the second law of thermodynamics, so that argument is recursive. The answer is that the second law of thermodynamics has no theoretical basis divorced from an observer so it cannot in and of itself explain the observer’s perceived direction of time. That would have to either be fundamental property of an observer (e.g. in an LFW world) or would have to be derived from some other time asymmetric property of nature.
“By the way Anon, it would be nice if your comments contained a lot more arguments and a lot less rhetoric.”
You’ve made a number of broad claims. I asked for substantiation, since you declared abiogensis about as likely as a person materializing on the other side of a wall later explained the rejection of ID by scientists in terms of metaphysical bias against, of all things, libertarian free will and a priori agent causation (as opposed to reasons like those I linked to). Instead of providing it, you provided nothing more than rhetoric which I replied to. In cases where I could attempt to ferret out an argument, I linked to contradicting substance. In the end, this reduced into “my subjective perception that has a basis I’m not interested in articulating.”
“I did not base my personal estimate on a tornado in a junkyard model, because I think that is ridiculous.”
You did, however, endorse assuming the steps occur by random chance. If you are referring to random combination modeling, that is by definition “tornado in a junkyard.” If you didn’t mean that, it’s not at all clear what you did mean.
“A coherent theory of agent causation is much more subtle than “poof” - the more general field is known as process metaphysics.”
Clearly I was being flip, given that what you wrote didn’t have much connection to what I wrote, nor was an accurate means of explaining why ID is rejected.
However,
“as the long term (i.e. positive) argument for ID critically depends on them.”
Ok. Please take Micheal Behe’s “Irreducible Complexity” argument for the design of the blood clotting cascade and explain why it critically depends on Libertarian Free Will. I’m quite interested in hearing this.
“But certainly ad hominem characterizations are out of line.”
What ad hominem?
Wow. Tons of posts yesterday and I haven’t had time to read all of Mark’s latest tome.
I’d just say that I tend to agree that there is unfortunately a lot of over-generalizations on the anti-ID side. I think that there are some valuable inquiries in ID that are unfortunately all too often castigated unfairly by anti-IDers. Having said that I tend to think that a lot of this ID folk bring on themselves by also engaging in a fair amount of polemic. Had they been a bit more careful and tried to separate themselves from the fundamentalists and especially creationists they might have done a bit better.
I do agree however that what the ID folk are attacking isn’t well developed. However I’m not sure that means we have reason to doubt nor does it mean that the biologists are being sloppy. Careful work moving from foundational physics into chemistry to show the continuity has really only been possible the last 10 years or so. I remember while at Los Alamos folks being terribly excited because they had finally modeled a water molecule using basic quantum mechanics. The calculations involved are tremendous. Moving from statistical mechanics to biology in a rigorous fashion seems unlikely to be successful beyond extreme simplifications for some time.
Having said that though I do think that the main bone of ID contention, the basis of macro-evolution, will be established in the laboratory within the next 5 - 10 years. Yes, that’s a matter of faith. But I think it will happen.
I also think that ID folk downplay some of the anti-ID arguments from outside of the realm of information theory or statistics. That is, akin to how one might dispute the 9/11 conspiracies without necessarily being able to explain the physics. Why would a divine being act in this fashion? I tend to find those sorts of arguments very compelling. Although I recognize they are somewhat subjective and not convincing to everyone.
Mark, thanks for the reference to Zemansky and the various definitions. We may need to refer to them in future discussions. The one for entropy seems a little strange—I’m familiar with the k * log(omega), but the “after adjusting for the macroscopic information content of the system” is not a part of the definition I have seen anywhere else.
I basically agree on comments of microscopic vs. macroscopic treatments. Regarding the supposedly mysterious second law and the arrow of time, the usual explanation is that it follows from special initial conditions. I gather this is good enough for most (including me), though I recognize there are some that are not completely satisfied with it.
Because I find that threads become too diffuse to be useful after 20 or 30 comments, this will be my last comment on this thread. Once things reach such a point I prefer that someone write a new post (here or some other blog) to refocus discussion on a particular point of interest. I will try to do so in the next few days on the subject of entropy and information, as I too am still interested in this.
One last point before bowing out on this thread… I was surprised to see agent causation and libertarian free will be tied so closely to ID. I have doubted agent causation’s status.
Anon, I readily admit that I should be more careful in wording.
First point of clarification - all statistical models are based on randomness, but that does not mean all outcomes are predicted with the same probability. If that were the case Quantum Mechanics (in its present state) would be useless. So jumping to the conclusion that I was using a “tornado in a junkyard model” was unwarranted. No serious ID advocate (Behe, Demske, et al) has made any such claim (or even published numbers) regarding the formation rates of large scale bio-mechanical structures.
I beleive that general information theoretic considerations - rather than ones that model specific types of chemical reactions - will be much more useful in the long run. As Clark has indicated QM simulations get computationally intractable in a big hurry. My argument in #3 and #8 relies not on any sort of published numbers, but completely on the information theoretic consequences of the type of initial conditions assumed in various models, in particular the consequences of assuming that the universe started out as a bundle of randomly distributed particles.
Sometimes people like to talk about the universe starting in a low entropy state - unfortunately that language is far too vague to be useful - a crystal near absolute zero is a low entropy state, but no one suggests a crystal is going to spontanteous develop higher order complexity, without outside interference of some kind, because it is obviously ridiculous.
What should be said instead is that from our perspective the universe appears to have started out in a high _information_ state. Information, aka macroscopic complexity, is much more interesting than random thermal noise, which never evolves into anything but more random thermal noise (second law of thermodynamics).
That begs the question of course of what sort of process explains the fact that information (macroscopic complexity) appears to be increasing, not decreasing, when the second law of thermodynamics indicates that a consistent model that an increase in entropy (or uncertainty) corresponds to a decrease in available information.
Some scientists (notably Erwin Schroedinger) have characterized life as not so much consuming energy as consuming “negentropy” or information. The main line of my argument is that the source of this information has to be explained.
Zero entropy *does not* imply high information. In conventional units, an empty vacuum has both zero entropy (no thermal noise) and zero information (no macroscopic structure).
On the other hand, a modern microprocessor IC at absolute zero has very high information (assuming the layout is previously known), but very low entropy. The entropy corresponds to unknown flaws in the crystalline structure of various parts of the IC.
One of the reasons for the confusion is that the statistical thermodynamicists (Brillouin et al) defined information in differential terms, i.e. without a baseline, comparable to the way that electrical potential lacks a baseline. That is an easy problem to fix, but standard nomenclature has not caught up - measuring the information content of a model just isn’t necessary for common thermodynamic problems, which focus on the random, unknown, entropic aspect after all.
Christian, ID would definitely be an intellectual curiosity in a conventional deterministic universe. The problem of machines designing other machines is interesting, but has obvious infinite regress problems.
I realize the subjectiveness or model dependent nature of the free energy / heat or information / entropy distinction is neither widely taught nor widely appreciated, but it appears to be the key to the whole issue here.
A typical example is the historical debate over Maxwell’s demon - i.e. can he open and shut the door to make entropy flow up hill. It has been shown that under normal conditions, no he could not. But under extremely rare conditions, he could just leave the door open and entropy would flow up hill anyway, entropy measured from a macroscopic perspective of course.
Anon, current ID arguments are negative - “irreducible complexity” arguments (which certainly lack scientific rigor at this point) are based on pointing out the lack of tested scientific theories for how such a complex system could be formed - in particular that selective advantage does not apply to the mutations that would be required, unless they all occured simulataneously, which is statistically implausible.
I am talking about the positive argument for Intelligent Design - i.e. establishing positive evidence rather than negative evidence there is both Intelligence and Design in the universe - e.g. that we are actually novel and creative beings, not just deterministic or random deterministic machines, that our decisions are based at root on the ability to choose independent of both external and internal circumstances, that the past might have been otherwise, and that our future is not cast in stone.
That is one of the leading questions in philosophy, something that as yet has science has provided little help with. The debate generally boils down to the rationality of holding people morally responsible for actions they cannot control - i.e. where they literally could not have done otherwise (given prior antecedents) - or where choosing differently would strictly be the result of random chance.
Anon, one last thing I was referring to the charge of “conspiratorial thinking” - a statement like that needs support to avoid appearing ad hominem.
I have heard the charge before, and it is unfair, unfounded, and untenable. A simple hypothesis that social, political, or historical considerations has something to do with the popularity of alternative theories has absolutely nothing to do with the perception or allegation of a “conspiracy”- which is, by definition, collusive in nature.
No serious commentator ever suggests collusion without serious evidence. I have not suggested that scientists have some sort of devious motive, just that there is a serious difference of opinion about basic metaphysical assumptions, something there is abundant evidence for. In the theory of human cognition, for example, it is the subject of constant academic debate.
I hardly think that ID can be classified a failure quite yet, by the way - as a scientific program it is in its embryonic stages. Many lines of inquiry take decades to percolate into something useful.
Just a note Christian. You might be interested in Sklar’s text on the philosophy of statistical mechanics. My copy is unfortunately buried in a pile of books at the moment, awaiting the repair of my carpet so my shelves can return. Maybe in the future I’ll write up a bit on it.
I should add that like you I tend to have my doubts about agent causation. The whole entropy sink problem to me is a big reason why I doubt it - so I suppose I’m in the opposite camp in that regard to Mark.
I used to have this fantastic undergraduate thermo text I’d picked up from my dad. It was rather advanced for an undergraduate text. But the best thing about it was a relatively long appendix that dealt with symmetries and applying noether’s theorem to develop all the laws of thermodynamics from symmetries in the system under discussion. It was an amazingly helpful way of going about things. I’m sure it would have blown away any undergraduate who tried to learn thermo that way. But it was a very, very nice alternative to the traditional phenomenological approach (which makes it all nearly magic) and even the statistical approach. But it also, as I recalled, touched in quite a pronounced way on the informational and philosophical aspects. Unfortunately in some move I lost the book and don’t even recall the authors. (I’m not sure the text proved popular, sort of like trying to teach undergraduate lower division physics from the Feynman lectures - they are masterful but probably not the best to teach novices — my Dad got a copy from a textbook salesman)
Mark, your ideas on applying information theory to natural selection are thought-provoking. If I understand correctly, what you are saying is that natural selection can reflect (in a sense) information, but cannot create it, or at least cannot produce a net increase. As an evolutionary biologist, I have not heard anyone talk about this with any clarity, probably because few people have a sufficient understanding of information theory. I would like to discuss this further if you would contact me (stanleyspencer {at} gmail.com).
If anyone knows how I can contact Mark Butler, please let me know. He hasn’t participated (that I have seen) in any online discussions for several months, so I suspect he may not read this post either.