--- title: "mvDFA" subtitle: "Multi-Variate Detrended Fluctuation Analysis using Package `mvDFA`" author: "Julien P. Irmer" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{mvDFA} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` This `R` package provides an implementation of multivariate extensions of a well-known fractal analysis technique, Detrended Fluctuations Analysis (DFA; Peng et al., 1995, ), for multivariate time series: multivariate DFA (mvDFA). Several coefficients are implemented that take into account the correlation structure of the multivariate time series to varying degrees. These coefficients may be used to analyze long memory and changes in the dynamic structure that would by univariate DFA. Therefore, this `R` package aims to extend and complement the original univariate DFA (Peng et al., 1995) for estimating the scaling properties of nonstationary time series. ## Install the package Install from CRAN: ```{r, eval=FALSE} install.packages("mvDFA") ``` ## Install the latest working version from GitHub This requires the package `devtools`. The following code installs `devtools`, if it is not installed. ```{r, eval=FALSE} if(!"devtools" %in% installed.packages()) install.packages("devtools") devtools::install_github("jpirmer/mvDFA") ``` Load the package: ```{r setup} library(mvDFA) ``` ## Test functions for approximated correlated Time Series with different Hurst Exponent ### Simulate correlated time series Choose a covariance matrix $\Sigma$ ```{r} Sigma <- Sigma <- matrix(.5, 4, 4) diag(Sigma) <- 1 Sigma[3,4] <- Sigma[4,3] <- -.3 Sigma ``` Simulate from the covariance matrix with approximate univariate Hurst-exponents $H:=(.2, .5, .7, .9)'$. ```{r} set.seed(2023) X <- simulate_cMTS(N = 10^3, H = c(.2, .5, .7, .9), Sigma = Sigma, simulation_process = "FGN0", decomposition = "chol", cor_increments = FALSE) head(X) ``` `simulation_process = "FGN0"` uses `longmemo::simFGN0` to generate independent (Fractional) Gaussian Processes, which are then mixed using the Cholesky decomposition $D$ of $\Sigma := DD'$. Changing this argument to `simulation_process = "FGN.fft"` uses `longmemo::simFGN.fft` (using FFT) to generate independent (Fractional) Gaussian Processes, which are then mixed using the Cholesky decomposition $D$ of $\Sigma := DD'$ via `decomposition = "chol"`. The differences may be inspected here: ```{r, fig.align='center', fig.height=3, fig.width=4} x1 <- simulate_cMTS(N = 3*10^2, H = c(.5), Sigma = as.matrix(1), simulation_process = "FGN0", cor_increments = FALSE) plot(x1$X1, main = "H = 0.5 and FGN0", type = "l") x2 <- simulate_cMTS(N = 3*10^2, H = c(.5), Sigma = as.matrix(1), simulation_process = "FGN.fft", cor_increments = FALSE) plot(x2$X1, main = "H = 0.5 and FGN.fft", type = "l") ``` Note: the true Hurst Exponents might be slightly off, since the mixing (weighted sums) of the time series makes them slightly more normal (due to the central limits theorem). Hence, small Hurst-exponents ($H<0.5$) tend to be upwards biased, while larger Hurst-exponents tend to be downward biased ($H>0.5$). Further, the use of Cholesky might influence the generation of data in a different way than another decomposition of $\Sigma$ would. An alternative is the eigen-value (or singular value) decomposition via `decomposition = "eigen"`. Further research is needed! Finally, we did not correlated the increments but the whole time series (`cor_increments = FALSE`), if instead increments should be correlated use `cor_increments = FALSE`. ### Estimate multivariate extensions of Hurst exponent for correlated time series The `mvDFA` function needs only the time series in long format as a `matrix` or `data.frame` object. Multiple `cores` can be used (maximum is `parallel::detectCores()`), which reduces computation time drastically for longer timer series. The `steps` argument manipulates the number of window sizes, which are separated logarithmically. Larger numbers of steps result in more precise estimates of the extended Hurst exponents, but require more computation time. The `degree` option specifies the degree of the polynomial of the time trend used for detrending. ```{r, message=FALSE, warning=FALSE} mvDFA_result <- mvDFA(X = X, steps = 50, cores = 1, degree = 1) mvDFA_result ``` The slope of the log-log regression of the root-mean square fluctuations $RMS$ vs. window size $s$ gives an estimate for the long memory coefficient $L$ within the data. The output argument `Ltot` corresponds to the total variance approach, which uses root-mean of the sum of the diagonal elements of the covariance matrix $C_{\nu,s}$ of the detrended fluctuation object $Y_{\nu,s}$ within each window to compute the corresponding root mean square fluctuation $RMS_\text{tot}(\nu,s)$ averaged across all the subsequences $\nu$ of length $s$. `Lgen` corresponds to the generalized variance approach, which uses root-mean of the determinant of the covariance matrix $C_{\nu,s}$ of the detrended fluctuation object $Y_{\nu,s}$ within each window to compute the corresponding root mean square fluctuation $RMS_\text{tot}(\nu,s)$ averaged across all the subsequences $\nu$ of length $s$. Further, we output the corresponding coefficient for all variances and covariances individually (see `Lfull`), averaged across all univariate time series (see `LmeanUni`) and done for each univariate time series individually (see `univariate_DFA`). Further arguments hidden in the primary output are the corresponding $R^2$ per analysis approach: ```{r} mvDFA_result[6:10] ``` Further, the used RMS objects and the used window sizes can be extracted using the following arguments: ```{r} mvDFA_result$S mvDFA_result$RMS_tot mvDFA_result$RMS_gen head(mvDFA_result$CovRMS_s) ``` which may be used to compute the Hurst exponents by hand or to display the log-log-plots. We present this for the total and the generalized approach: ```{r, fig.align='center', fig.height=5, fig.width=5} # total df_tot <- data.frame(mvDFA_result[c("S", "RMS_tot")]) reg_tot <- lm(I(log10(RMS_tot)) ~ 1 + I(log10(S)), data = df_tot) coef(reg_tot)[2] mvDFA_result$Ltot plot(log10(df_tot)) abline(reg_tot) ``` ```{r, fig.align='center', fig.height=5, fig.width=5} # generalized df_gen <- data.frame(mvDFA_result[c("S", "RMS_gen")]) reg_gen <- lm(I(log10(RMS_gen)) ~ 1 + I(log10(S)), data = df_gen) coef(reg_gen)[2]/4 mvDFA_result$Lgen plot(log10(df_gen)) abline(reg_gen) ``` *** ### References Peng, C. K., Havlin, S., Stanley, H. E., & Goldberger, A. L. (1995). Quantification of scaling exponents and crossover phenomena in nonstationary heartbeat time-series. Chaos, 5, 82–87.