Cirrhosis of the UCI Liver Data-set

The BUPA Liver data-set, donated in 1990 by Richard S.~Forsyth to the UCI Machine Learning Repository, has been used as a classification benchmark in many hundreds of machine learning papers. However, the large majority misinterpret this dataset: they take the final variable as a dependent variable indicating presence or absence of a liver disease. They try to learn classifiers which predict its value based on the other, independent variables. However, this is incorrect. The final variable in the dataset is a “selector”, intended to indicate a train-test split. It does not indicate presence or absence of liver disease, or any other biomedical property. Hence, results on this dataset are meaningless.

Most existing papers which misinterpret the dataset in this way are not greatly damaged by their misunderstanding, because they are simply using it as one benchmark among many, and they are not making claims about clinical practice. Nevertheless, it’s interesting to see the range of venues where the error has occurred, including: Pattern Recognition Letters, ICML, and NIPS.

However, a few authors have interpreted their results as saying something about the diagnosis and treatment of liver disorders. Such claims appear in biomedical-oriented publication venues such as: Journal of Medical Systems, and Artificial Intelligence in Medicine. They are significantly damaged by their misinterpretation.

Richard Forsyth (involved in the original work on the dataset) and I wrote a short letter to Pattern Recognition Letters to point this problem out, published here.

We’ve been contacting various sites where the incorrect interpretation is repeated, including UCI itself, openml.org, and some R packages.

(EDIT 8 June 2016: since publishing the letter, we’ve found another instance of the misunderstanding, and this is worth mentioning because it is a well-known article by one of the biggest names in statistical machine learning: Breiman’s Two Cultures article.)