Friday, February 14, 2025

Protein evolution probability, take three

Wow, this is my third post in a week on this one topic. You'd think I found it interesting or something!

I've been reading around to try to find out how controversial or accepted Douglas Axe's 1 in 1077 functional protein estimate is, and it turns out it's very controversial. There have been other estimates made by other people in which the ratio of functional to non-functional proteins are a lot higher than what Douglas Axe estimated. This paper, for example, estimates that 1 in 1011 proteins are functional. It says,

In conclusion, we suggest that functional proteins are sufficiently common in protein sequence space (roughly 1 in 1011) that they may be discovered by entirely stochastic means, such as presumably operated when proteins were first used by living organisms. However, this frequency is still low enough to emphasize the magnitude of the problem faced by those attempting de novo protein design.

Since this estimate is many orders of magnitude greater than what Douglas Axe estimated, I want to do a rough back-of-the-napkin estimate of what the probability is of getting a functional protein just in the Milky Way Galaxy within 1 billion years and some much stingier probablistic resources than I used in my last couple of posts on this subject (here and here).

I'll assume there are 100 billion stars in the galaxy, 7% are G-type stars, only G-type stars are working on the problem, and only 20% of them have planets in the habitable zones. That's 1.4 x 109 planets working on the problem.

I'll assume the same proportion of carbon, oxygen, hydrogen, and nitrogen in the lithosphere of each planet, but only a small fraction is available to try to make proteins. Instead of taking the elements out of the entire lithosphere, I'll take them out of a volume about the size of Crater Lake.

I asked two different AI's to estimate the mass of the water in Crater Lake. One said about 1013 kg, and the other said about 1012 kg, so let's go with 1012 kg. I'll spare you all the details I didn't spare you last time and just tell you I calculated that there would be 2.5 x 1036 carbon atoms which allows you to make 1.67 x 1011 proteins with 300 amino acids each.

With 1.4 x 109 planets making 1.67 x 1011 proteins per second for 1 billion years (i.e. 3.1536 x 1016 seconds), that comes out to a total of 7.37 x 1036 tries in all. Let's simplify that to 1036 and plug it into our equation to get the probability of finding a functional de novo protein.

\[ \normalsize 1 - \left(1 - \frac{1}{10^{11}}\right)^{10^{36}} \]

There you have it. It looks like you'd be guaranteed to find a functional protein. Again, I have no idea if the estimate for the fraction of functional to non-functional proteins is correct, so I still don't know if these calculations are worth anything. But based on these estimates, it looks like it's very likely you could get de novo proteins, even with stingy probablistic resources, somewhere in the galaxy.

Unless I hear of some solid uncontroversial estimates of the ratio of functional to non-functional proteins of average length, I think I'm probably going to say the argument against evolution from the improbability of de novo protein evolution is not a good argument. It relies too heavily on controversial estimates. It may turn out to be valid if more information comes in, but we'll just have to wait and see. It could also be made valid by taking into consideration more of the details about how proteins are made and how cells work. More knowledge about exo-planets and the chemistry in the early earth may also contribute.

Some final thoughts

I emailed Mr. Pruett, who I mentioned in the first post, to solicit his feedback on that first post. He knows a lot more about this topic than I do. Based on what he said, there's a lot more complications in coming up with probablities than are reflected in my thought experiment. For example, I ignored how genes actually work, including all the machinery needed to build proteins. I ignored the fact that genes can be altered somewhat without altering the resulting protein. There's also the issue of some proteins requiring other proteins in order to fold up correctly. They don't all just fold themselves. A realistic thought experiment, I'm afraid, would be really complicated.

My strategy has been similar to what we used to do in my calculus classes in college. I remember in one of the classes, we had to figure out whether an equation that spits out a series of numbers was convergent or divergent. If the equation is too complicated to figure that out, you can simplify the equation in such a way that you know it's either more or less likely than the original equation to be convergent or divergent. If you're testing for convergence, and you know your simplification is less likely to be convergent than the original equation, but it converges anyway, then you know your original equation is convergent.

Mr. Pruett also pointed out that I over-complicated part of my calculation. I could've just started with 1080 atoms in the universe and figured out how many of them are carbon atoms, and gone from there. I didn't have to talk about star types, habitable planets, lithospheres, etc.

Mr. Pruett made a good point I wish I had considered. I gave very generous time constraints on building proteins, but if I wanted to test de novo genes in already existing species, those appear to pop up pretty quickly in nature. The Cambrian Explosion only lasted maybe 30 million years, and lots of new genes (and their corresponding proteins) had to have come into existence during that short window of time. That's three orders of magnitude less time than my original 13.8 billion year estimate and two orders of magnitude less than my more restricted estimates of 1 to 5 billion years.

Mr. Pruett made an interesting psychological point. Suppose we calculated that it's nearly impossible for the universe to cough up certain functional proteins, but we went out in nature and discovered that they exist. It's unlikely that a biologist would say, "Wow, that's a miracle." It's more likely they would say, "I guess nature is more clever than we thought." When it comes to trying to figure out whether nature could do something on its own or whether it needs divine assistance, our worldview presuppositions are probably going to carry more weight than our calculations.

I'm not saying necessaily that it shouldn't. After all, a person might have good reason for subscribing to their worldview. If I make some calculation that allows me to make a prediction about what I should expect to find in nature, and I go out in nature and find that things are very different, I probably should doubt the assumptions that went into my calculation. I mean that's how science works. You come up with a hypothesis, you make a prediction based on your hypothesis, and you test it by making observations to see if the prediction pans out.

I think what the protein evolution probability argument attempts to do is not test the assumptions that go into the calculation, but to test the worldview of naturalism. If you assume naturalism as part of your hypothesis, and you use various assumptions to make a calculation that predicts something about proteins, and you go out in nature and find out that your prediction was wrong, that is supposed to cast doubt, not on the assumptions that went into your calculation, but on the assumption of your worldview. Somebody who subscribes to naturalism who runs the same experiment and falsifies their prediction is going to questions the assumptions that went into their calculation rather than their naturalistic worldview. And maybe they should. I don't know. I guess at that point it depends whether you're more sure about your worldview or you're more sure about the assumptions that went into your calculations, not to mention your confidence in entering them in your calculator correctly.

Anyway, thank you for joining me on this journey. It's been interesting for me.

No comments: