Genesweep

A Short Exploration of How Scientists' Guess

Scroll to begin
down arrow

This story revolves around one lab.

It's a biology lab. Picture test tubes, microscopes, that kind of thing.

But it's not just any old lab.

See, Cold Springs Harbor Laboratory was founded by James Watson (who got half the Nobel Prize for discovering the structure of DNA, and then became racist).


It's a prestigious lab.

And because of its prestige, Cold Springs Harbor Laboratory gets to be involved in all the important stuff in biology.

Importantly, for this project, that includes the Human Genome Project

[1]
This project was a pretty big deal, almost to the point of being a thesis in itself. If successful, the researchers thought, the finding could usher in an era of personalized medicine. The government, which was competing with private firms at the time, gave the project almost $5 billion, making it one of the largest and best funded scientific enterprises to date.
. From 1999 to 2002, CSHL played host to the National Genomic Conferences, where the best and brightest in the field would come and talk about the state of sequencing the human genome.

Girl in a jacket
Girl in a jacket
Girl in a jacket
Girl in a jacket
Pictures from the National Genomic Conferences in 2000, curtesy of the CSHL archives.

But importantly for our story, there's also a bar at CSHL.

And people play games in bars

[2]
The bar at CSHL is called The Eagle, evoking the publichouse in Cambridge by the same name. According to legend, Francis Crick and James Watson interrupted lunch at the Eagle in Cambridge on Feburary 28th, 1953 to announce their proposal for the structure of DNA.
.









The Bar at CSHL











Bar games are interesting, especially when groups of scientists play them.

There have been a couple of high-profile scientific wagers , but in this instance, because we have several data points, we can leverage laws of central tendancy to get a sense of how the scientific community (as whole)'s expectations changed throughout the project.

"Dr. Dear was asked how he had predicted such a low number three years ago when numbers around 50,000 were popular. It was late at night, he said, and he had been drinking in a bar. Second, at that hour, people's behavior did not seem so different from that of fruit flies, then known to have 13,500 genes, implying that twice that number would be ample for humans. Third, his birth date, April 27, 1962, immediately suggested 27,462 as the correct number."—The New York Times

And those expectations are important! In case you haven't noticed, we are definitely not living in an era of genetically personalized medicine. And that's largely because scientists still don't have definitive consensus about how many genes are in the human genome, or rather, even what a gene really is. It depends on who you ask.

But what is clear is that the number of genes in human genome is much, much lower than any scientist had originally intended. And we can actually see that consensus emerge in this dataset. Keep your eye on the right-hand extreme.

The Bar at CSHL



My idea was to combine this data with data from the web on the scientists' individual publications and research keywords, with the hopes of showing some sort of meso-level between-group variance among these scientists. There's a lot of literature that would suggest that there would be a difference between the guesses of, for example, a biologist who studies plants or worms versus a biologist who studies something bigger, like humans.


But I haven't found anything too convincing yet.


Thanks for reading anyway.