Yesterday, I made a post talking about the probability of one functional protein 200 amino acids long being formed through undirected processes somewhere in the observable universe. I had to make a lot of guesses, but to give our protein its best chance, I made very generous estimates. Based on my estimates, I calculated a near 100% probability of the universe spitting out at least one functional protein 200 amino acids long.
Today, I thought I'd see what ChatGPT would say. I'll use the same probability equation, and the same line of reasoning, but I'll let ChatGPT come up with my estimates for me. Whenever ChatGPT gives a range, I'll use the upper end of the range (with one exception). Here's what ChatGPT said:
Stars in the universe: 1 x 1021
Fraction of stars that could host planets in the habitable zone: 25%
How much carbon, nitrogen, hydrogen, and oxygen are on an average planet like earth?:
For rocky planets like earth. . .H: 2%
C: 0.5%
N: 0.3%
O: 50%
ChatGPT didn't say, but I'm going to assume those percentages are by mass. It looks like ChatGPT is just considering earth's crust, too, which is good. That's what I want.
I wanted to know which of these would be the limiting factor, so I asked ChatGPT how many of each atom we would have if we took one of each of the 20 usual amino acids and added up all the hydrogen, carbon, nitrogen, and oxygen in them. A couple of them have Sulpher, but I'm going to ignore that for simplicity. ChatGPT said,
C: 101
H: 161
N: 29
O: 49
It looks like either carbon or hydrogen is going to be the limiting factor. Let's go with carbon.
What is the average length of a protein?
ChatGPT said 300 to 400. This time, I'm going to go with that lower limit of 300.
What is the average lifespan of a star?
Chat GPT gave three estimates--one for red dwarves, one for high mass stars, and one for sun-like stars. The red dwarves live a really long time, but are mostly uninhabitable because of how active they are, and massive stars don't live very long at all, so I'm just going to go with sun-like stars. The average there is 10 billion years.
It seems unreasonable to use the entirety of earth's mass in my calculation because proteins aren't going to form in the mantel or in earth's core. So I asked ChatGPT how much of earth's mass makes up the lithosphere. ChatGPT said 1 to 2%, so I'm going to go with 2%. Earth's mass is 5.7 x 1024, so the lithosphere must be 1.14 x 1023 kg.
Let's do some calculations.
First, I'm still going to assume 1 try per second.
I'm going to assume all the amino acids are in one big soup.
The mass of the earth's lithosphere is 1.14 x 1023 kg. 0.5% of that is carbon, so there's 5.7 x 1020 kg of carbon in the lithosphere. An average carbon atom weights 1.99 x 10-26 kg, so there are about 2.86 x 1046 carbon atoms in the lithosphere.
You need 101 carbon atoms for a full set of the 20 standard amino acids, so with those carbon atoms, you can create 2.83 x 1044 full sets. Each set has 20 amino acids, so that's 1.42 x 1043 individual amino acids per planet.
An average protein has 300 amino acids, so that's 4.73 x 1040 proteins per planet. That's going to be the number of tries per second per planet.
There are 1 x 1021 stars, and 25% of them have planets in the habital zone, so that's 2.5 x 1020 planets in the habitable zone.
Although earth has 10 billion years, only 5 billion of that will have life on it. Proteins need to form in a shorter span than 5 billion years if there are to be multiple species and diversity, so I'm going to give each planet 2 billion years to create an average protein. That's 6.31 x 1016 seconds.
Now, I think we can calculate the number of tries.
(1 try per/sec) x (6.31 x 1016 sec) x (4.73 x 1040 proteins/planet) x (2.5 x 1020 planets) = 7.46 x 1077 protein tries. This is getting interesting.
Now, we can plug that into our equation using the Douglas Axe estimate of 1 functional protein for every 1077 proteins of a given length. He used 150 amino acids, but I'm assuming the fraction is the same for all lengths.
\[ \normalsize 1 - \left(1 - \frac{1}{10^{77}}\right)^{7.46 \times 10^{77}} \]
The exponent is pretty close to the denominator, so we could get a real probability here. Since I can't put those huge exponents in my calculator, I played around. I tried replacing the 1077 in both places with 2, 10, 100, 1000, and 1,000,000. I got pretty close to the same result each time, so I'll bet that's what it is. The probability came out to be 99.9%, which means you'd be practically guaranteed to get a functional protein.
It is possible that I made a math error. I've gone through and corrected myself two or three times since posting this, so there's a possibility I could go through it again and find another mistake.
A lot of these numbers are speculative. I guess you can get whatever probability you want depending on how you massage the numbers. You can be generous or stingy with your assumptions. As I said in the last post, I think the pivotal unknown is the fraction of proteins of a given length that could be functional out of all the possible sequences of amino acids in a given length. I suggested in the last post how we might be able to figure that out with the new AlphFold AI thingy. Since nobody has done it, as far as I know, I used Douglas Axe's estimate, which, as I explained in the last post, I'm not so sure about.
One thing I learned in this whole thing is that if you're just looking for any functional protein, the length of the protein doesn't figure into the probability (except when you're determining how many proteins you're going to get per planet with your available amino acids). All that matters is what fraction of proteins of any given length will be functional. That fraction may, for all we know, be the same regardless of length. But like I said in the last post, we don't necessarily know that the fraction is the same in all lengths. The only way to figure that out is through experimentation or simulation. Assuming it's the same for all lengths, the length only figures into the probability if you're looking for one particular sequence of that length. Then the length matters a great deal to the probability.
You could make the length relevant if you considered the probability of different lengths with any sequence. It does seems like the longer a sequence is, the less probable it is. On the other hand, that may have a lot to do with how it is formed. If you had two proteins 200 units long, and they merged in one event, you'd have one 400 units long. That would be easier than if you had one 200 units long and it mutated through successive generations until it grew to 400 units. It's probably simpler to leave this probability out.
One interesting thing I took from this is that if you ignore the 1 x 1021 stars in the universe and all the planets surrounding them, and you focused only on earth, the probability of getting any functional protein on earth would be almost non-existent. But if you include the whole observable universe, then you're guaranteed to get the functional protein somewhere in the universe. So there's a sense in which we really did win the lottery here on earth.
That's assuming, of course, that there's some validity to my thought experiment. It is, admittedly, speculative. It uses a lot of really rough estimates and simplifications. If there is some validity to it, it would answer the Fermi paradox. Life in the universe is extremely rare. Advanced intelligent life like ours even more so.
Wait! There's more! I wrote a third post on this topic after looking further into estimates for functional to non-functional amino acid sequences and after getting some feedback from Paul Scott Pruett.
No comments:
Post a Comment