Testing for a Float in a Vector
Equality between floating points is always challening when programming (see https://en.wikipedia.org/wiki/Floating-point_arithmetic#Accuracy_problems). One way to determine if two numbers are equal is to set a precision. In this short snippet I create a function (%noVwithin%
) to determine if a number exists in a vector of floating point numbers. The function is then vectorized so that it can be used in tidyverse expressions.
Function - %noVwithin%
1precision <- 1e-10
2
3`%noVwithin%` <- function(x, y) {
4 any(
5 sapply(y, function(z) {
6 abs(x - z) <= precision
7 })
8 )
9}
The function %noVwithin%
takes two variables, checking to see if x
is within the vector y
. It is similar to the R function %in%
but works with floating point numbers. The example below illustrates both %in%
and %noVwithin%
1a <- 0.8
2b <- 0.4
3val <- a + b
4print(val %in% c(1.1, 1.2, 1.3))
5# [1] FALSE
6print(val %noVwithin% c(1.1, 1.2, 1.3))
7# [1] TRUE
Vectorizing the Function
Our function works but fails when used in a tidyverse pipe:
1d <- tibble::tibble(
2 a = seq(0.1, 0.5, 0.1),
3 b = seq(1.1, 1.5, 0.1),
4 val = a + b
5)
6
7# # A tibble: 5 × 3
8# a b val
9# <dbl> <dbl> <dbl>
10# 1 0.1 1.1 1.2
11# 2 0.2 1.2 1.4
12# 3 0.3 1.3 1.6
13# 4 0.4 1.4 1.8
14# 5 0.5 1.5 2
15
16myVals <- seq(1, 1.6, 0.2)
17# [1] 1.0 1.2 1.4 1.6
18
19d |>
20 dplyr::mutate(in_myVals = val %noVwithin% myVals)
21# # A tibble: 5 × 4
22# a b val in_myVals
23# <dbl> <dbl> <dbl> <lgl>
24# 1 0.1 1.1 1.2 TRUE
25# 2 0.2 1.2 1.4 TRUE
26# 3 0.3 1.3 1.6 TRUE
27# 4 0.4 1.4 1.8 TRUE
28# 5 0.5 1.5 2 TRUE
29
Here, val
is identified as present in myVals
even when it is not.
Vectorizing is simple. We just pass the function to Vectorize()
, passing a list of argument names that we wish to vectorize. In this case we are passing just the x
variable as y
is fixed when calling.
1`%within%` <- Vectorize(`%noVwithin%`, vectorize.args = "x")
Our new function, %within%
, is the vectorized version. Running the code above with %within%
gives the expected result.
1d |>
2 dplyr::mutate(in_myVals = val %within% myVals)
3
4# # A tibble: 5 × 4
5# a b val in_myVals
6# <dbl> <dbl> <dbl> <lgl>
7# 1 0.1 1.1 1.2 TRUE
8# 2 0.2 1.2 1.4 TRUE
9# 3 0.3 1.3 1.6 TRUE
10# 4 0.4 1.4 1.8 FALSE
11# 5 0.5 1.5 2 FALSE
Using the %in% Function
Running the above code with the base R %in%
function (which, like many base R functions, is vectorized) in place of %within%
produces an interesting output:
1d |>
2 dplyr::mutate(in_myVals = val %in% myVals)
3# # A tibble: 5 × 4
4# a b val in_myVals
5# <dbl> <dbl> <dbl> <lgl>
6# 1 0.1 1.1 1.2 FALSE
7# 2 0.2 1.2 1.4 FALSE
8# 3 0.3 1.3 1.6 TRUE
9# 4 0.4 1.4 1.8 FALSE
10# 5 0.5 1.5 2 FALSE
Everything is false, as expected, except for 1.6
. Looking at val
and myVals
illustrates why.
Here are the values of val
at 20 decimal places:
1d$val |> formatC(digits = 20, format = 'f')
2# [1] "1.20000000000000017764" "1.40000000000000013323"
3# [3] "1.60000000000000008882" "1.80000000000000026645"
4# [5] "2.00000000000000000000"
and here are the values stored in the myVals
vector:
1myVals |> formatC(digits = 20, format = 'f')
2# [1] "1.00000000000000000000" "1.19999999999999995559"
3# [3] "1.39999999999999991118" "1.60000000000000008882"
It's interesting to note that both values for 1.6 (d[3, ]$val and myvals[4]) are identical, hence the %in%
comparison works for 1.6.
Alternative approaches
dplyr::rowwise()
The non-vectorized version works when used in conjunction with dplyr::rowwise()
as rowwise
computes one row at a time.
1d |>
2 dplyr::rowwise() |>
3 dplyr::mutate(in_myVals = val %noVwithin% myVals)
4
5# # A tibble: 5 × 4
6# a b val in_myVals
7# <dbl> <dbl> <dbl> <lgl>
8# 1 0.1 1.1 1.2 TRUE
9# 2 0.2 1.2 1.4 TRUE
10# 3 0.3 1.3 1.6 TRUE
11# 4 0.4 1.4 1.8 FALSE
12# 5 0.5 1.5 2 FALSE
purrr::map
The purrr::map()
functions can work with non-vectorized functions within a mutate()
.
1d |>
2 dplyr::mutate(in_myVals = purrr::map_lgl(val, `%noVwithin%`, myVals))
3
4# # A tibble: 5 × 4
5# a b val in_myVals
6# <dbl> <dbl> <dbl> <lgl>
7# 1 0.1 1.1 1.2 TRUE
8# 2 0.2 1.2 1.4 TRUE
9# 3 0.3 1.3 1.6 TRUE
10# 4 0.4 1.4 1.8 FALSE
11# 5 0.5 1.5 2 FALSE