Beer Trip Around the World – Harvey Lieberman

The traveling salesperson problem (TSP) is an NP-complete problem that asks given a list of cities, what’s the shortest path that can be taken so that each city is visited once and you end up back where you started. Exact solutions to the problem exist but they become impractical as soon as the number of cities increases above 20 therefore approximations are generally performed that yield a suitable, if not best, solution.

Here we take a fun approach from a thought experiment. What if I could idenfity the best beers in the World and drink them in the breweries in which they are brewed. With a limited time and budget, wouldn’t this be a traveling salesperson problem? With this is mind I quickly drew up the following solution using the {TSP} R package.

Step 1 - Gather and wrangle the data

Data were retrieved from Kaggle. Kaggle contains a few beer datasets but the one chosen (https://www.kaggle.com/ehallmar/beers-breweries-and-beer-reviews) contains beers, reviews and breweries, making our life much easier. Once data were downloaded we could identify top-rated beers and join brewery details to produce a single, neat data frame.

library(readr)
library(dplyr)
library(here)

## read in beer data
df_beers <- read_csv(here("data/beers.csv"))

## read in beer ratings
df_ratings <- read_csv(here("data/reviews.csv"))

## read in breweries
df_breweries <- read_csv(here("data/breweries.csv"))

## summarize ratings by beer and sort
df_rating_summary <- df_ratings %>%
  group_by(beer_id) %>%
  summarise(rating = mean(score, na.rm = TRUE)) %>%
  arrange(desc(rating)) %>%
  filter(rating >= 4.75)

## add ratings summary to beer data frame
df_beer_ratings <- df_beers %>%
  right_join(df_rating_summary, by = c("id" = "beer_id")) %>%
  filter(retired == FALSE)

## add brewery details
df_beer_ratings <- df_beer_ratings %>% 
  select(id, beer=name, brewery_id, rating, abv) %>%
  left_join(df_breweries %>% select(id, brewery=name, city, state, country), by = c("brewery_id" = "id")) %>% 
  arrange(desc(rating))

## limit to single occurrence of each brewery
df_beer_ratings <- df_beer_ratings %>%
  group_by(brewery_id) %>%
  arrange(desc(rating), desc(abv)) %>%
  slice(1) %>%
  ungroup()

Step 2 - Convert locations to longitude/latitude

Now we have a table with locations we can easily convert the locations to longitude/latitude values which will be used to calculate distances. First we need to convert the county code from a two-character abbreviation and then we can use {tidygeocoder} to convert to longitude and latitude.

library(countrycode)
library(tidygeocoder)

## build city addresses
df_beer_ratings <- df_beer_ratings %>%
  mutate(country_name = countrycode(country, origin = "iso2c", destination = "country.name")) %>%
  mutate(address = if_else(country == "US", paste(city, state, "USA", sep = ", "), paste(city, country_name, sep = ", ")))

## take the first 300 locations
df_use_locations <- df_beer_ratings %>%
  slice(1:300)

## determine longitude and latitude
df_address <- geo(df_use_locations$address, method = "cascade")

## bind locations to data frame
df_use_locations <- bind_cols(df_use_locations, df_address %>% select(lat, long)) %>%
  filter(!is.na(lat), !is.na(long))

Step 3 - Build a distance matrix and run TSP

Once we have a list of longitude/latitude pairs we can build a distance matrix of the distances between each location (default measurement = meters). From this we can solve the traveling salesperson problem. In this example TSP is solved 1000 times and the distance (in miles) plotted. Finally, the shortest tour is plotted using leaflet.

library(geodist)
library(TSP)
library(leaflet)

## build a distance matrix
dist_m <- geodist(df_use_locations %>% select(lat, long), measure = "geodesic")

## build TSP and solve 1000 times
set.seed(1234)
tsp <- TSP(dist_m, labels = df_use_locations$address)
beer_trip <- lapply(seq(1000), function(x) solve_TSP(tsp))

## plot results and identify the shortest tour
tours <- sapply(beer_trip, tour_length) / 1609.34
plot(tours, type = "l", ylab = "tour length (miles)")
shortest_tour <- which.min(tours)


## reorder locations according to shortest tour
ref_order <- unlist(lapply(beer_trip[[shortest_tour]], function(x) x), use.names = FALSE)
df_solution <- df_use_locations[ref_order, ]
df_solution[nrow(df_solution) + 1, ] <- df_solution[1, ]

## plot tour on leaflet map
leaflet(data = df_solution) %>% 
  addTiles() %>% 
  addCircleMarkers(~long, ~lat, popup = ~brewery, radius = 2, color = "red") %>% 
  addPolylines(~long, ~lat, weight = 4)