Sujet : Re: Simplicity And Computing
De : ldo (at) *nospam* nz.invalid (Lawrence D'Oliveiro)
Groupes : comp.miscDate : 24. Apr 2024, 02:09:03
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <v09m3f$1ue86$2@dont-email.me>
References : 1 2 3
User-Agent : Pan/0.155 (Kherson; fc5a80b8)
On 24 Apr 2024 00:37:54 -0000, Scott Dorsey wrote:
We once gave a tour of our supercomputing cluster to some of the
organization IT managers, and someone honestly asked if we ran Excel on
it. This is honestly how IT people think computers are used.
You’d think scientists, in particular, would know better. A few years
ago, geneticists undertook to rename a bunch of genes, because--wait
for it--Excel was misinterpreting the existing names as dates.
Geneticists using Excel to analyze their data?? But there you go.
Here
<
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008984>
is a report on how the situation has improved since then.
Spoiler: it hasn’t.
I absolutely love the recommendations that they make. The first one is
a biggie:
Scripted analyses are preferred over spreadsheets. Gene name to
date conversion is a bug specific to spreadsheets and doesn’t
occur in scripted computer languages like Python or R. In
addition, analyses conducted with Python and R notebooks (eg:
Jupyter or Rmarkdown) capture computational methods and results in
a stepwise fashion meaning these workflows can be more readily
audited. These notebooks can therefore achieve a higher level of
computational reproducibility than spreadsheets. Although this
requires a big investment in learning a computer language, this
investment pays off in the longer term.
Note that bit: “capture computational methods and results in a
stepwise fashion meaning these workflows can be more readily audited”.
Here I thought reproducibility was an absolutely non-negotiable
foundation stone of scientific research, yet it seems people have been
publishing results with nothing to back up their analyses other than
an Excel spreadsheet.