library(readr)
library(dplyr)
library(here)
## read in beer data
<- read_csv(here("data/beers.csv"))
df_beers
## read in beer ratings
<- read_csv(here("data/reviews.csv"))
df_ratings
## read in breweries
<- read_csv(here("data/breweries.csv"))
df_breweries
## summarize ratings by beer and sort
<- df_ratings %>%
df_rating_summary group_by(beer_id) %>%
summarise(rating = mean(score, na.rm = TRUE)) %>%
arrange(desc(rating)) %>%
filter(rating >= 4.75)
## add ratings summary to beer data frame
<- df_beers %>%
df_beer_ratings right_join(df_rating_summary, by = c("id" = "beer_id")) %>%
filter(retired == FALSE)
## add brewery details
<- df_beer_ratings %>%
df_beer_ratings select(id, beer=name, brewery_id, rating, abv) %>%
left_join(df_breweries %>% select(id, brewery=name, city, state, country), by = c("brewery_id" = "id")) %>%
arrange(desc(rating))
## limit to single occurrence of each brewery
<- df_beer_ratings %>%
df_beer_ratings group_by(brewery_id) %>%
arrange(desc(rating), desc(abv)) %>%
slice(1) %>%
ungroup()
The traveling salesperson problem (TSP) is an NP-complete problem that asks given a list of cities, what’s the shortest path that can be taken so that each city is visited once and you end up back where you started. Exact solutions to the problem exist but they become impractical as soon as the number of cities increases above 20 therefore approximations are generally performed that yield a suitable, if not best, solution.
Here we take a fun approach from a thought experiment. What if I could idenfity the best beers in the World and drink them in the breweries in which they are brewed. With a limited time and budget, wouldn’t this be a traveling salesperson problem? With this is mind I quickly drew up the following solution using the {TSP} R package.
Step 1 - Gather and wrangle the data
Data were retrieved from Kaggle. Kaggle contains a few beer datasets but the one chosen (https://www.kaggle.com/ehallmar/beers-breweries-and-beer-reviews) contains beers, reviews and breweries, making our life much easier. Once data were downloaded we could identify top-rated beers and join brewery details to produce a single, neat data frame.
Step 2 - Convert locations to longitude/latitude
Now we have a table with locations we can easily convert the locations to longitude/latitude values which will be used to calculate distances. First we need to convert the county code from a two-character abbreviation and then we can use {tidygeocoder} to convert to longitude and latitude.
library(countrycode)
library(tidygeocoder)
## build city addresses
<- df_beer_ratings %>%
df_beer_ratings mutate(country_name = countrycode(country, origin = "iso2c", destination = "country.name")) %>%
mutate(address = if_else(country == "US", paste(city, state, "USA", sep = ", "), paste(city, country_name, sep = ", ")))
## take the first 300 locations
<- df_beer_ratings %>%
df_use_locations slice(1:300)
## determine longitude and latitude
<- geo(df_use_locations$address, method = "cascade")
df_address
## bind locations to data frame
<- bind_cols(df_use_locations, df_address %>% select(lat, long)) %>%
df_use_locations filter(!is.na(lat), !is.na(long))
Step 3 - Build a distance matrix and run TSP
Once we have a list of longitude/latitude pairs we can build a distance matrix of the distances between each location (default measurement = meters). From this we can solve the traveling salesperson problem. In this example TSP is solved 1000 times and the distance (in miles) plotted. Finally, the shortest tour is plotted using leaflet.
library(geodist)
library(TSP)
library(leaflet)
## build a distance matrix
<- geodist(df_use_locations %>% select(lat, long), measure = "geodesic")
dist_m
## build TSP and solve 1000 times
set.seed(1234)
<- TSP(dist_m, labels = df_use_locations$address)
tsp <- lapply(seq(1000), function(x) solve_TSP(tsp))
beer_trip
## plot results and identify the shortest tour
<- sapply(beer_trip, tour_length) / 1609.34
tours plot(tours, type = "l", ylab = "tour length (miles)")
<- which.min(tours)
shortest_tour
## reorder locations according to shortest tour
<- unlist(lapply(beer_trip[[shortest_tour]], function(x) x), use.names = FALSE)
ref_order <- df_use_locations[ref_order, ]
df_solution nrow(df_solution) + 1, ] <- df_solution[1, ]
df_solution[
## plot tour on leaflet map
leaflet(data = df_solution) %>%
addTiles() %>%
addCircleMarkers(~long, ~lat, popup = ~brewery, radius = 2, color = "red") %>%
addPolylines(~long, ~lat, weight = 4)
Conclusion
The data may or may not be the best quality but the concept holds true. Total length of trip (between 297 locations) = 50718 miles.
Beer Trip Around the World – Harvey Lieberman Beer Trip Around the World – Harvey Lieberman Beer Trip Around the World – Harvey Lieberman Harvey Lieberman Traveling Salesman Problem between breweries Traveling Salesman Problem between breweries