There are tens of thousands of genes in the human genome: the tiny distortions of DNA and RNA that combine to express all the characteristics and characteristics that make us all unique. Each gene has a name and alphanumeric code, called a symbol, that scientists use to coordinate research. But in the past year or so, about 27 human genes have been renamed, all because Microsoft Excel has been misreading their symbols as dates.
The problem is not as surprising as it first seems. Excel is a behemoth in the spreadsheet world, and scientists often use it to track their work and even conduct clinical trials. But its default setting is for a more mundane application, so when a user enters an alphanumeric symbol of a gene into a spreadsheet, such as MARCH1, the acronym for “Membrane Association Ring-CH Finger 1”, Excel converts it to a date: 1-Mar (March 1).
THE STUDY FOUND THAT ONE-FIFTH OF THE GENETIC DATA IN THE PAPER WAS AFFECTED BY EXCEL ERRORS. It was very frustrating, even dangerous, that scientists had to organize and recover the damaged data by hand. It is also surprisingly widespread and even affects peer-reviewed scientific work. A 2016 study examined genetic data shared next to 3,597 published papers and found that about one-fifth of the data were affected by Excel errors.
There is no easy solution to this error. Excel does not offer the option to turn off this auto-formatting, and the only way to avoid this is to change the data type of each column. Even so, scientists may revise their data, but as long as someone else opens the same spreadsheet in Excel without thinking, the error will be reintroduced.
But help has come, and that’s the HUGO Gene Naming Council, the scientific body responsible for standardizing genetic names. This week, HGNC released new gene naming guidelines, including symbols for affecting data processing and retrieval. From now on, human genes and the proteins they express will be named with Excel’s auto-formatting in mind. In other words, the symbol MARCH1 is now MARCHF1, and SEPT1 becomes SEPTIN1, and so on. HGNC will store records of old symbols and names to avoid future confusion. So far, the names of about 27 genes have changed in this way over the past year, but the guidelines themselves were not officially released until this week.