--- title: "riverconn vignette" author: "Damiano Baldan" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: TRUE number_sections: TRUE vignette: > %\VignetteIndexEntry{riverconn vignette} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} urlcolor: blue --- ```{r setup, echo = FALSE} knitr::opts_chunk$set(crop = FALSE) ``` # Indices to assess riverscape connectivity The riverconn package is used to calculate indices for river network connectivity. For a review of the indices, see [Jumani et al., 2021](https://doi.org/10.1088/1748-9326/abcb37), while for a list of the functionalities of the package and its architecture, see [Baldan et al., 2022](https://doi.org/10.1016/j.envsoft.2022.105470). If you use this package, cite it as: https://doi.org/10.1016/j.envsoft.2022.105470. For a tutorial on how to generate graphs representing rivers from "real world" data see: https://damianobaldan.github.io/riverconn_tutorial/ ## The river network as a graph This package implements algorithms to compute commonly used indices to assess river networks connectivity All those indices assume a conceptualization of the river networks as a graph $L = (E,V)$, where vertices (nodes) $V$ represent single reaches, and edges (links) $E$ represent either confluences or longitudinal barriers. For example, the graphs below represents a river with ten reaches. The graph on the left is directed, i.e. edges are defined for ordered pair of vertices. The graph on the right is undirected, as the order of vertices for the definition of edges is unimportant. Both graph have a 'tree-like' structure, since no loops exist (acyclic graphs): this structure can be used to describe a river system. In both examples, both barriers and confluences are present. The edges between nodes 1 and 2 and 3 and 2 are confluences. The edge between node 2 and 4 is a barrier. ```{r graph example, message = FALSE, collapse = TRUE, width = 60, warning = FALSE, echo = FALSE} library(igraph) g <- graph_from_literal(1-+2, 2-+4, 3-+2, 4-+6, 6-+7, 5-+6, 7-+8, 9-+5, 10-+5 ) oldpar <- par(mfrow = c(1,1)) par(mfrow=c(1,2)) plot(g, layout = layout_as_tree(as.undirected(g), root = 8, flip.y = FALSE)) plot(as.undirected(g), layout = layout_as_tree(as.undirected(g), root = 8, flip.y = FALSE)) par(oldpar) ``` ## Generalized riverscape connectivity index river networks-level connectivity can be expressed in terms of coincidence probability (Pascual-Hortal and Saura, 2006), i.e. the probability that two random points in a river network are connected. Once the dispersal probability $I_{ij}$ is defined for each couple of $i,j$ nodes in the graph, generalized connectivity indices for catchment and reach scales can be calculated. ### Catchment Connectvity index The catchment-scale connectivity index (CCI) is calculated as: $$ \ CCI = \sum_{i = 1}^{n} \sum_{j = 1}^{n} I_{ij} \frac{w_i w_j}{W^2} $$ Where $w_i$ and $w_j$ are some node-level attributes (weights), and $W$ is the sum of sum of the nodes weights for the whole river networks. ### Reach Connectivity Index The reach-scale connectivity index (RCI) is calculated by limiting the summation to all the connections to the single node $i$. $$ \ RCI_i = \sum_{j = 1}^{n} I_{ij} \frac{w_j}{W} $$ Where $w_j$ are some node-level attributes (weights), and $W$ is the sum of sum of the nodes weights for the whole river networks. ### Node weights Nodes weights can be arbitrarily chosen. Common features used are the reach length $l_i$, area $A_i$, or volume $V_i$. Alternatively, the habitat suitability index (HSI) can be used, defined as the ratio of length/area that is suitable for a specific organism. $$ \ HSI_i = \frac{{l}_{i,suitable}}{l_i} $$ Here ${l}_{i,suitable}$ is the fraction of reach length of reach $i$ that are suitable, and are usually referred to as weighted suitable length or weighted suitable area. ## Dispersal probability The dispersal probability depends on several factors: the presence of barriers between nodes $i$ and $j$ , the presence of suitable habitats in nodes $i$ and $j$ and alongside the connection, and the distance between $i$ and $j$. The dispersal probability $I_{ij}$ is thus determined by several contributions. Those contributions are multiplied: $$ \ I_{ij} = c_{ij}B_{ij} $$ where $c_{ij}$ accounts for the structural connectivity, i.e. it depends exclusively on the presence of barriers between nodes $i$ and $j$, and $B_{ij}$ accounts for the functional connectivity, i.e. it depends exclusively on the distance and the organisms movement/dispersal abilities. ### Structural connectivity The structural connectivity depends on the presence of barriers between nodes $i$ and $j$, and can be expressed as a function of the types of barriers present in the path expressed as a sequence of passability values. The passability $p_m$ for the $m$-th barrier is defined as the probability that the reaches immediately upstream and downstream the barrier $m$ are connected. If the flow directionality is not relevant (i.e. the river graph can be conceptualized as undirected), $$ \ c_{ij} = \prod_{m = 1}^{k} p_m^u p_m^d $$ Where the product extends over the $k$ nodes that are part of the path connecting reaches $i$ and $j$, $p_m^u$ is the upsstream passability of the $m$-th barrier and $p_m^u$ is the upstream passability of the $m$-th barrier. This definition based solely on products yields a symmetric coincidence probability (i.e. $c_{ij} = c_{ji}$). A directional version of $c_{ij}$ can be defined as: $$ \ c_{ij} = \prod_{m = 1}^{k} p_m^{eq} $$ $$ p_m^{eq} = \begin{cases} p_m^u & \mbox{if barrier m is encountered moving upstream in the path from i to j } \\ p_m^d & \mbox{if barrier m is encountered moving downstream in the path from i to j } \end{cases} $$ If $i$ and $j$ are located in different sub-catchments, the path from i to j will be moving downstream in some sections and upstream in some other secions: this $p_m^{eq}$ definition ensures the retained passability value is consistent with the directionality of the path from $i$ to $j$ (i.e. $c_{ij} \ne c_{ji}$) ### Functional connectivity Functional connectivity can be calculated as a function of the distance between reaches. An exponential dispersal kernel can be used: $$ \ B_{ij} = PD^{d_{ij}} $$ where $PD$ is in the $(0,1)$ interval (smaller values mean more restricted movement), and $d_{ij}$ is the distance between reaches $i$ and $j$. Alternatively, a threshold based probability can be used: $$ \ B_{ij} = \begin{cases} 0 & \mbox{when } d_{ij}>d_{tr} \\ 1 & \mbox{when } d_{ij}<=d_{tr} \end{cases} $$ Both definitions can be easily adapted to asymmetric dispersal by defining $PD_{d}$, $PD_{u}$, $d_{tr,u}$, and $d_{tr,d}$, and calculating $B_{ij} = B_{ij}^u B_{ij}^d$ where $B_{ij}^u$ and $B_{ij}^d$ are the index $B_{ij}$ contribution calculated for the 'downstream moving' and 'upstream moving' sections in the path from reach $i$ to $j$. The distance $d_{ij}$ can be either the geometric distance, or any other measure of effective distance (e.g. $d_{ij} / (1-{HSI}_{ij})$ provides an estimate of effective distance that depends on the habitat suitability index between reaches $i$ and $j$) ## Prioritization of barriers All the defined connectivity index can be used to prioritize barriers removal with a 'leave-one-out' approach. For each barrier, the index $dCCI$ can be defined as: $$ \ dCCI_{m} = 100 \frac{CCI - CCI_{m, removed}}{GCI} $$ where $CCI$ is the generalized connectivity index calculated for the original river networks with all the barriers implemented, and CCI_{m, removed} is the index recalculated when barrier $m$ is removed or its passability is changed (an equivalent for the reach scale, $dRCI_{i}$, can be defined similarly) . An alternative version of the index for prioritizing barriers can be calculated as the decrease in river networks connectivity after a single barrier is implemented, with a 'add-one' approach. ## Time-dependent connectivity When barriers metadata on the year of construction and the year of implementation of mitigation measures are available, a time trajectory of GCI can be computed (e.g. Segurado et al., 2013). # Preprocessing of input data This package relies heavily on the functionalities of the `igraph` package. The `igraph` package implements routines for simple graphs and network analysis. It can handle large graphs very well and provides functions for generating random and regular graphs, graph visualization, centrality methods and much more. The package allows for easy construction of `igraph` objects based on edges and vertices lists or adjacency matrices. The book 'Statistical Analysis of Network Data with R' by Kolaczyk and Csardi (2014) offers a comprehensive tutorial on the possibilities offered by the 'igraph' package. A more comprehensive tutorial, including a real-world case study can be found here: https://damianobaldan.github.io/riverconn_tutorial/ ## Preliminary steps ```{r libraries, message = FALSE, collapse = TRUE, width = 60, warning = FALSE} library(igraph) library(dplyr) library(tidyr) library(viridis) library(riverconn) library(doParallel) ``` ## Input class 'igraph' object All the functions implemented in this package use as main input an object of class `igraph`. There are different ways an object of class `igraph` can be created. A symbolic sequence of edges can be used with the function `graph_from_literal` for small, toy graphs. ```{r g definition, message = FALSE, collapse = TRUE, width = 60, warning = FALSE} g <- graph_from_literal(1-+2, 2-+5, 3-+4, 4-+5, 6-+7, 7-+10, 8-+9, 9-+10, 5-+11, 11-+12, 10-+13, 13-+12, 12-+14, 14-+15, 15-+16) g ``` Note that when a graph is defined this way, edged and vertices attributes are not defined. ```{r g inspection, message = FALSE, collapse = TRUE, width = 60, warning = FALSE} # Edges E(g) # vertices V(g) ``` The graph can be converted to data frame with the function `as_data_frame`, specifying if edges or vertices are to be exported. Accordingly, the function `graph_from_data_frame` can be used to create an igraph object from a data frame. ```{r g as sf, message = FALSE, collapse = TRUE, width = 60, warning = FALSE} igraph::as_data_frame(g, what = "edges") igraph::as_data_frame(g, what = "vertices") ``` Finally, an igraph object can be exported to and generated from adjacency matrices using the functions `as_adjacency_matrix` and `graph_from_adjacency_matrix`, specifying if edges or vertices are to be exported. ```{r g as adj, message = FALSE, collapse = TRUE, width = 60, warning = FALSE} igraph::as_adjacency_matrix(g) ``` ## Decorating the class 'igraph' object Once the structure of the network is defined, the graph can be decorated with edges and vertices attributes. Attributes can be either added directly to the graph or joined to the edges and vertices data frame. edges and vertices attributes are saved as vectors, so common, data.frame-like operations are possible. Here we add the dam information as edge attribute, including the field 'id_dam', and the reach information data as vertices attributes, including the length and the corresponding habitat suitability index. ```{r g decorate, message = FALSE, collapse = TRUE, width = 60, warning = FALSE} # Decorate edges E(g)$id_dam <- c("1", NA, "2", "3", NA, "4", NA, "5", "6", NA, NA, NA, NA, "7", NA) E(g)$type <- ifelse(is.na(E(g)$id_dam), "joint", "dam") E(g) # Decorate vertices V(g)$length <- c(1, 1, 2, 3, 4, 1, 5, 1, 7, 7, 3, 2, 4, 5, 6, 9) V(g)$HSI <- c(0.2, 0.1, 0.3, 0.4, 0.5, 0.5, 0.5, 0.6, 0.7, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8) V(g)$Id <- V(g)$name V(g) ``` ## Assigning network directionality The `riverconn` package implements the function `set_graph_directionality` that allows to assign the directionality of the graph once an outlet is defined. ```{r direction, message = FALSE, collapse = TRUE, width = 60, warning = FALSE} oldpar <- par(mfrow = c(1,1)) par(mfrow=c(1,3)) g1 <- set_graph_directionality(g, field_name = "Id", outlet_name = "16") g2 <- set_graph_directionality(g, field_name = "Id", outlet_name = "5") plot(as.undirected(g), layout = layout_as_tree(as.undirected(g), root = 8, flip.y = FALSE)) plot(g1, layout = layout_as_tree(as.undirected(g1), root = 16, flip.y = FALSE)) plot(g2, layout = layout_as_tree(as.undirected(g2), root = 5, flip.y = FALSE)) par(oldpar) ``` # Indices calculation The function `index_calcualtion` is used to calculate all the nuances of the CCI and RCI Before calculation, the information on the barriers passability are needed. ```{r pass def, message = FALSE, collapse = TRUE, width = 60, warning = FALSE} # Check edged and nodes attributes, add pass_u and pass_d fields g_v_df <- igraph::as_data_frame(g, what = "vertices") g_v_df g_e_df <- igraph::as_data_frame(g, what = "edges") %>% mutate(pass_u = ifelse(!is.na(id_dam),0.1,NA), pass_d = ifelse(!is.na(id_dam),0.7,NA)) g_e_df # Recreate graph g <- igraph::graph_from_data_frame(d = g_e_df, vertices = g_v_df) g ``` Index with default settings. ```{r index 1, message = FALSE, collapse = TRUE, width = 60, warning = FALSE} index_calculation(g, param = 0.9) ``` Index with default settings, only $c_{ij}$ or $B_{ij}$ contributions ```{r index 2, message = FALSE, collapse = TRUE, width = 60, warning = FALSE} index_calculation(g, B_ij_flag = FALSE) index_calculation(g, param = 0.9, c_ij_flag = FALSE) ``` Index with default settings, only $B_{ij}$ contributions with threshold on the distance ```{r index 3, message = FALSE, collapse = TRUE, width = 60, warning = FALSE} index_calculation(g, c_ij_flag = FALSE, dir_distance_type = "asymmetric", disp_type = "threshold", param_u = 0, param_d = 5) index_calculation(g, c_ij_flag = FALSE, dir_distance_type = "asymmetric", disp_type = "threshold", param_u = 5, param_d = 10) index_calculation(g, c_ij_flag = FALSE, dir_distance_type = "symmetric", disp_type = "threshold", param = 10) ``` Index for reach, inbound connections used, only $B_{ij}$ contributions with threshold on the distance ```{r index 4, message = FALSE, collapse = TRUE, width = 60, warning = FALSE} index_calculation(g, c_ij_flag = FALSE, index_type = "reach", index_mode = "to", dir_distance_type = "asymmetric", disp_type = "threshold", param_u = 0, param_d = 5) index_calculation(g, c_ij_flag = FALSE, dir_distance_type = "asymmetric", index_type = "reach", index_mode = "to", disp_type = "threshold", param_u = 5, param_d = 10) index_calculation(g, c_ij_flag = FALSE, index_type = "reach", index_mode = "to", dir_distance_type = "symmetric", disp_type = "threshold", param = 10) ``` # Barriers prioritization calculation The function `index_calcualtion` allows to calculate the CCI and RCI changes when barriers are removed. Metadata on which dams are to be removed and how the passability changes are to be provided in the 'dams_metadata' object. Parallel calculations can be activated. ```{r idams metadata, message = FALSE, collapse = TRUE, width = 60, warning = FALSE} dams_metadata <- data.frame("id_dam" = c("1", "2", "3", "4", "5", "6", "7"), "pass_u_updated" = c(1, 1, 1, 1, 1, 1, 1), "pass_d_updated" = c(1, 1, 1, 1, 1, 1, 1)) dams_metadata d_index_calculation(g, barriers_metadata = dams_metadata, id_barrier = "id_dam", parallel = FALSE, ncores = 3, param_u = 10, param_d = 10, param = 0.5, index_type = "full", dir_distance_type = "asymmetric", disp_type = "threshold") ``` # References and key literature Belletti, Barbara, et al. “More than one million barriers fragment Europe’s rivers.” Nature 588.7838 (2020): 436-441. Cote, D., Kehler, D. G., Bourne, C., & Wiersma, Y. F. (2009). A new measure of longitudinal connectivity for stream networks. Landscape Ecology, 24(1), 101-113. Kolaczyk, E. D., & Csárdi, G. (2014). Statistical analysis of network data with R (Vol. 65). New York: Springer. Jumani, S., Deitch, M. J., Kaplan, D., Anderson, E. P., Krishnaswamy, J., Lecours, V., & Whiles, M. R. (2020). River fragmentation and flow alteration metrics: a review of methods and directions for future research. Environmental Research Letters.