Newsportal USENET - Re: Luskin still doesn't get DNA sequence analysis

On 7/1/2025 11:31 PM, John Harshman wrote:

On 7/1/25 8:07 AM, RonO wrote:
On 6/30/2025 10:02 PM, John Harshman wrote:
On 6/30/25 1:32 PM, RonO wrote:
On 6/30/2025 1:44 PM, John Harshman wrote:
On 6/30/25 9:01 AM, RonO wrote:
https://evolutionnews.org/2025/06/on-human-chimp-genetic- differences- the-critics-misstate-my-arguments/
>
He is claiming that his critics are not addressing his claims or misstating them. Luskin is falsely claiming that the 1% difference between chimp and human genomic DNA that had been determined for the sequence that was compared in order to determine the evolutionary relationship between extant life forms is some type of Icon of evolution. It is just what the sequence difference was. It wasn't the 1% number that determined that chimps are the most closely related species to humans it was how that 1% difference related to all the other species that we had the same sequence for and could compare them.
>
Coding sequence is still 0.7% different between chimps and humans, and the sequence around the genes that we can compare are between 1 and 1.8% different. We can't use a lot of the sequence around the coding sequence to compare extant species to each other because it is so different between species that we can't compare it accurately. I took around 800 base-pairs of the tyrosinase exon 1 coding sequence and the following 900 base-pairs of the first intron. Chimps and humans were 99% similar by BLAST for this sequence. Green monkey was 95% similar for the sequence compared to humans, but the coding sequence was less than 3% different and the intron sequence was almost 9% different (there were also a couple of indels in the intron). I tried the same sequence for mouse and BLAST would not align the intron sequence. It was just too different. Even under BLAST for more relaxed conditions it would only align bits of the intron in around 60 base-pair pieces where there was enough similarity where they might be the same sequence (many of these short matches were not the same sequence and were on other chromosomes of the genome assembly). The 826 base-pairs of coding sequence was 87% similar, but the intron sequence was not matched up. It might be expected to be around 60% similar if you could align it, but any alignment likely would not be that accurate.
>
This just means that the sequence that we determined to be around 1% different (coding sequence) is the sequence that would be used to determine evolutionary relationships between mammals. It is the sequence that can be used to determine that chimps are the most closely related species to humans that exist on this planet today. But outside of mammals the accuracy of coding sequence decreases because of the degenerate code and multiple substitutions at the third position of codons that you can't tell if they have been mutated multiple times or not. You can only use noncoding sequence like introns to compare closely related taxa because they change too fast to be useful.
>
All fine except for that last bit. Introns and other junk sequences are perfectly useful to considrable phylogenetic depth. They work for the deepest divergences in birds, for example, and I imagine they work for pretty deep divergences in mammals too, certainly for all primates.
>
That is what I meant when I wrote that "You can only use noncoding sequence like introns to compare closely related taxa because they change too fast to be useful." I just didn't put in useful to descriminate deeper divergences.
>
Perhaps we have different understandings of "closely related". Are all mammals closely related?
>
No. I did not mean as deeply as you seem to be claiming. I noted that the human and mouse intron sequence could not be matched up. Humans and mice separated before a lot of avian lineages in terms of time of separation, but not by much, and you can't use introns for any accurate type of phylogenetic analysis for such mammalian lineages. Birds seem to be a separate case. The introns are shorter and more conserved in some of the sequence. I knew that they used intron sequences in the phylogenetic analysis to place some avian lineages, but it was not for the deepest branch nodes.
Not true. Even the deepest avian nodes can be resolved using introns. There are even a few introns that can be aligned between birds and crocodiles, though they're unusually slow-evolving.

I did find that out, but you can't match up the introns of mammals. I tried to do it decades ago (I had a post doc with cattle) and for most introns you could not match up mouse and human or cattle and human introns. There was some highly conserved sequences in some introns, but they were likely regulatory sequences. Blast would not match up the mouse and human tyrosinase intron, and under relaxed conditions it would produce short matches, but a lot of the short matches were with other chromosomes and not the intron sequence.
Further on I do determine that avian introns are more conserved. I worked with chicken sequence for decades and never bothered to look at the intron sequences because of my experience with mammals.
I worked on an intron sequence that was highly conserved between birds and mammals it was the regulatory sequence for sonic hedge hog but it was in an intron of a neighboring gene. SNP in the enhancer regulatory sequence in both mammals and birds resulted in extra digits.

Introns of course vary in evolutionary rate, both across taxa and within genomes. But I think the usefulness of introns even in mammal phylogeny is generally underrated.

Try to match up the mammalian introns. I gave the example of Tyrosinase intron 1. BLAST would not match it up so it is less than 55% similarity between mice and humans.

I checked the same Tyrosinase intron sequence that could not be matched up between mice and humans for chickens and pigeons, and about 1/3 of the sequence was likely not good for phylogenetic analysis (60% similarity with multiple indels to make those matches) but the other 2/3 was as high as 80% similar for 60 bp stretches with only a few indels, so you could pick intron sequences that you could still align. I wouldn't like to maximize sequence similarity by inserting indels for intron sequence.
Why not? The indels are there, and generally easy to align. They can often be used as informative characters.

Because once you get down to around 75% similarity placing the indels in the sequence likely produces spurious matches in the sequence flanking the indel. Some analyses that use sequence with indels remove the affected sequence and some surrounding sequence from the analysis. If you just count the indel as a single event you would likely be making spurious matches in the surrounding sequence immediately flanking the indel because the indel is inserted where it maximizes the similarity and not where it may have actually occurred.

The conserved regions likely averaged 75% similarity. You'd likely need to implement some type of transition/transversion analysis to estimate double hits, and have some estimate for the rate of change for nonconserved sites relative to the conserved sequences.
I recommend a maximum likelihood model.

I used to use ML in Felsenstein's philip back in the 1980's. I was working with nematodes and other invertebrates and was usually dealing with divergent sequences.
Ron Okimoto

Divergence estimate between mice and humans is around 90 million years, and between chickens and pigeons is around 89 million years (estimates that I got from google).
>
Ron Okimoto
>
>
The additional sequence that Luskin is beefing about was sequence that we could not obtain in all taxa, and it is still sequence that can't be accurately compared between the taxa with complete genome sequences because of the repetitive nature of the sequence and the rapid copy number variations between species and the rapid evolution of the sequence of the heterochromatin repeats.
>
Luskin is literally beefing about something that never mattered, and still does not matter.
>
The mitochondrial DNA sequence is 8.9% different between chimps and humans and still indicates that chimps are the most closely related species to humans. In the 1980's mitochondrial DNA was the first sequence used to determine that of the other great apes chimps were our closest relative. It wasn't the 1% genomic coding sequence difference. For the tyrosinase sequence that I used in the above analysis both gorilla and chimps are 99% similar (16 mismatches with Gorilla and 20 mismatches with chimps) by blast alignment.
>
Ron Okimoto
>
>
>
>
>
>

Date	Sujet	#	Auteur
30 Jun17:01	Luskin still doesn't get DNA sequence analysis	8	RonO
30 Jun19:44	Re: Luskin still doesn't get DNA sequence analysis	7	John Harshman
30 Jun21:32	Re: Luskin still doesn't get DNA sequence analysis	6	RonO
1 Jul04:02	Re: Luskin still doesn't get DNA sequence analysis	5	John Harshman
1 Jul16:07	Re: Luskin still doesn't get DNA sequence analysis	4	RonO
2 Jul05:31	Re: Luskin still doesn't get DNA sequence analysis	3	John Harshman
2 Jul14:52	Re: Luskin still doesn't get DNA sequence analysis	2	RonO
2 Jul16:24	Re: Luskin still doesn't get DNA sequence analysis	1	John Harshman