Checking internal consistency of CAS numbers

Checking internal consistency of CAS numbers
R
Chemistry
Author

Harvey

Published

December 26, 2019

CAS numbers consist of three numbers separated by dashes. The first comprises of two to seven digits, the second is two digits and the third is one digit. The last number is, in fact, a check digit for the rest of the code. It’s simple to check if a CAS number contains an error by calculating a checksum as follows:

For example, the CAS number for ethanol is 64-17-5.
Checking: (7 x 1) + (1 x 2) + (4 x 3) + (6 x 4) = 45. 45 mod 10 = 5 (which is the check digit).

As a function in R this is:

cas_check <- function(cas) {
  cas_split <- strsplit(cas, '-')[[1]]
  cas_validation <- as.numeric(cas_split[3])
  cas_to_check <- as.numeric(rev(strsplit(paste0(cas_split[1], cas_split[2]), '')[[1]]))
  cas_sum <- sum(cas_to_check * seq(length(cas_to_check)))
  return(cas_sum %% 10 == cas_validation)
}
cas_check("64-17-5")
[1] TRUE