Testing for a Float in a Vector

Equality between floating points is always challening when programming (see https://en.wikipedia.org/wiki/Floating-point_arithmetic#Accuracy_problems). One way to determine if two numbers are equal is to set a precision. In this short snippet I create a function (%noVwithin%) to determine if a number exists in a vector of floating point numbers. The function is then vectorized so that it can be used in tidyverse expressions.

Function - %noVwithin%

1precision <- 1e-10
2
3`%noVwithin%` <- function(x, y) {
4  any(
5    sapply(y, function(z) {
6      abs(x - z) <= precision
7    })
8  )
9}

The function %noVwithin% takes two variables, checking to see if x is within the vector y. It is similar to the R function %in% but works with floating point numbers. The example below illustrates both %in% and %noVwithin%

1a <- 0.8
2b <- 0.4
3val <- a + b
4print(val %in% c(1.1, 1.2, 1.3))
5# [1] FALSE
6print(val %noVwithin% c(1.1, 1.2, 1.3))
7# [1] TRUE

Vectorizing the Function

Our function works but fails when used in a tidyverse pipe:

 1d <- tibble::tibble(
 2  a = seq(0.1, 0.5, 0.1),
 3  b = seq(1.1, 1.5, 0.1), 
 4  val = a + b
 5)
 6
 7# # A tibble: 5 × 3
 8#       a     b   val
 9#   <dbl> <dbl> <dbl>
10# 1   0.1   1.1   1.2
11# 2   0.2   1.2   1.4
12# 3   0.3   1.3   1.6
13# 4   0.4   1.4   1.8
14# 5   0.5   1.5   2 
15
16myVals <- seq(1, 1.6, 0.2)
17# [1] 1.0 1.2 1.4 1.6
18
19d |>
20  dplyr::mutate(in_myVals = val %noVwithin% myVals)
21# # A tibble: 5 × 4
22#       a     b   val in_myVals
23#   <dbl> <dbl> <dbl> <lgl>    
24# 1   0.1   1.1   1.2 TRUE     
25# 2   0.2   1.2   1.4 TRUE     
26# 3   0.3   1.3   1.6 TRUE     
27# 4   0.4   1.4   1.8 TRUE     
28# 5   0.5   1.5   2   TRUE     
29

Here, val is identified as present in myVals even when it is not.

Vectorizing is simple. We just pass the function to Vectorize(), passing a list of argument names that we wish to vectorize. In this case we are passing just the x variable as y is fixed when calling.

1`%within%` <- Vectorize(`%noVwithin%`, vectorize.args = "x")

Our new function, %within%, is the vectorized version. Running the code above with %within% gives the expected result.

 1d |>
 2  dplyr::mutate(in_myVals = val %within% myVals)
 3
 4# # A tibble: 5 × 4
 5#       a     b   val in_myVals
 6#   <dbl> <dbl> <dbl> <lgl>    
 7# 1   0.1   1.1   1.2 TRUE     
 8# 2   0.2   1.2   1.4 TRUE     
 9# 3   0.3   1.3   1.6 TRUE     
10# 4   0.4   1.4   1.8 FALSE    
11# 5   0.5   1.5   2   FALSE    

Using the %in% Function

Running the above code with the base R %in% function (which, like many base R functions, is vectorized) in place of %within% produces an interesting output:

 1d |>
 2  dplyr::mutate(in_myVals = val %in% myVals)
 3# # A tibble: 5 × 4
 4#       a     b   val in_myVals
 5#   <dbl> <dbl> <dbl> <lgl>    
 6# 1   0.1   1.1   1.2 FALSE    
 7# 2   0.2   1.2   1.4 FALSE    
 8# 3   0.3   1.3   1.6 TRUE     
 9# 4   0.4   1.4   1.8 FALSE    
10# 5   0.5   1.5   2   FALSE  

Everything is false, as expected, except for 1.6. Looking at val and myVals illustrates why.

Here are the values of val at 20 decimal places:

1d$val  |> formatC(digits = 20, format = 'f')
2# [1] "1.20000000000000017764" "1.40000000000000013323"
3# [3] "1.60000000000000008882" "1.80000000000000026645"
4# [5] "2.00000000000000000000"

and here are the values stored in the myVals vector:

1myVals |> formatC(digits = 20, format = 'f')
2# [1] "1.00000000000000000000" "1.19999999999999995559"
3# [3] "1.39999999999999991118" "1.60000000000000008882"

It's interesting to note that both values for 1.6 (d[3, ]$val and myvals[4]) are identical, hence the %in% comparison works for 1.6.

Alternative approaches

dplyr::rowwise()

The non-vectorized version works when used in conjunction with dplyr::rowwise() as rowwise computes one row at a time.

 1d |>
 2  dplyr::rowwise() |>
 3  dplyr::mutate(in_myVals = val %noVwithin% myVals)
 4
 5# # A tibble: 5 × 4
 6#       a     b   val in_myVals
 7#   <dbl> <dbl> <dbl> <lgl>    
 8# 1   0.1   1.1   1.2 TRUE     
 9# 2   0.2   1.2   1.4 TRUE     
10# 3   0.3   1.3   1.6 TRUE     
11# 4   0.4   1.4   1.8 FALSE    
12# 5   0.5   1.5   2   FALSE    

purrr::map

The purrr::map() functions can work with non-vectorized functions within a mutate().

 1d |>
 2  dplyr::mutate(in_myVals = purrr::map_lgl(val, `%noVwithin%`, myVals))
 3
 4# # A tibble: 5 × 4
 5#       a     b   val in_myVals
 6#   <dbl> <dbl> <dbl> <lgl>    
 7# 1   0.1   1.1   1.2 TRUE     
 8# 2   0.2   1.2   1.4 TRUE     
 9# 3   0.3   1.3   1.6 TRUE     
10# 4   0.4   1.4   1.8 FALSE    
11# 5   0.5   1.5   2   FALSE