Checking internal consistency of CAS numbers
CAS numbers consist of three numbers separated by dashes. The first comprises of two to seven digits, the second is two digits and the third is one digit. The last number is, in fact, a check digit for the rest of the code. It's simple to check if a CAS number contains an error by calculating a checksum as follows:
- Remove the last digit and all dashes
- Reverse the rest of the digits
- Multiply the first digit by one, the second by two, and so on
- Sum the multiplied values
- Take the modulo of ten of this sum
- This should equal the check digit
For example, the CAS number for ethanol is 64-17-5.
Checking: (7 x 1) + (1 x 2) + (4 x 3) + (6 x 4) = 45. 45 mod 10 = 5 (which is the check digit).
As a function in R this is:
1cas_check <- function(cas) {
2 cas_split <- str_split(cas, '-')[[1]]
3 cas_validation <- as.numeric(cas_split[3])
4 cas_to_check <- as.numeric(rev(strsplit(paste0(cas_split[1], cas_split[2]), '')[[1]]))
5 cas_sum <- sum(cas_to_check * seq(length(cas_to_check)))
6 return(cas_sum %% 10 == cas_validation)
7}