--- title: "Using the RNGforGPD package" author: "Hesen Li, Ruizhe Chen, Hai Nguyen, Yu-Che Chung, Ran Gao, Hakan Demirtas" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Using the RNGforGPD package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r, echo = FALSE, include = FALSE} library(RNGforGPD) library(corpcor) library(mvtnorm) library(Matrix) ``` # Introduction This vignette file conveys certain ideas behind the generalized Poisson distribution and some examples of applying the functions in this package (**RNGforGPD**). # Functions and Comments **GenUniGpois** We choose different data generation methods according to different parameter values because restrictions apply when the rate parameter or the dispersion parameter of the generalized Poisson is within certain ranges. For example, the normal approximation method does not work well for theta < 10. ```{r} GenUniGpois(2, 0.9, 5000, method = "Branching") GenUniGpois(5, -0.4, 5000, method = "Inversion") GenUniGpois(12, 0.5, 5000, method = "Normal-Approximation") data = GenUniGpois(3, 0.9, 10, method = "Build-Up", details = FALSE) data data = GenUniGpois(10, 0.4, 10, method = "Chop-Down", details = FALSE) data ``` **ComputeCorrGpois** From a practical perspective, correlation bounds among variables are typically narrower than between −1 and 1 (the theoretical maximum and minimum correlation bounds) because different correlation upper and lower bounds may be imposed by the marginal distributions. A simple sorting technique can be used to obtain approximate correlation bounds and this approach works regardless of the data type or the distributional assumption (Demirtas, Hedeker 2011). Using the sorting technique, we wrote the function `ValidCorrGpois` that computes the lower and upper correlation bounds between a pair of generalized Poisson variables. Besides, this function serves as an integral part of the `ValidCorrGpois` function, which examines whether values of pairwise correlation matrices fall within the limits imposed by the marginal distributions. ```{r} set.seed(3406) ComputeCorrGpois(c(3, 2, 5, 4), c(0.3, 0.2, 0.5, 0.6), verbose = FALSE) ComputeCorrGpois(c(4, 5), c(-0.45, -0.11), verbose = FALSE) ``` **ValidCorrGpois** This function checks the required conditions of the values of pairwise correlations, which include positive definiteness, symmetry, correctness of dimensions, and whether the correlations fall within the correlation bounds. `ValidCorrGpois` ensures that the supplied correlation matrix is valid for simulating multivariate generalized Poisson distributions using `GenMVGpois`. ```{r} ValidCorrGpois(matrix(c(1, 0.9, 0.9, 1), byrow = TRUE, nrow = 2), c(0.5, 0.5), c(0.1, 0.105), verbose = TRUE) ValidCorrGpois(matrix(c(1, 0.9, 0.9, 1), byrow = TRUE, nrow = 2), c(3, 2), c(-0.3, -0.2), verbose = TRUE) ``` **QuantileGpois** This function computes quantiles for generalized Poisson distribution. We guarantee that there will be at least five classes if lambda is negative by forcing $m \geq 4$. ```{r} QuantileGpois(0.98, 1, -0.2, details = TRUE) QuantileGpois(0.80, 2, 0.025, details = FALSE) ``` **CorrNNGpois** This function applies the method proposed by Yahav, Shmueli 2011. They found that the relationship between the desired correlation matrix and the actual correlation matrix of a generalized Poisson distribution can be approximated by an exponential function. Following their simple and empirical approximation method we can correct our actual correlation matrix to the desired correlation matrix. Note that some desired correlations might be infeasible. ```{r} set.seed(3406) CorrNNGpois(c(0.1,10), c(0.1, 0.2),0.5) lambda.vec = c(-0.2, 0.2, -0.3) theta.vec = c(1, 3, 4) M = c(0.352, 0.265, 0.342) N = diag(3) N[lower.tri(N)] = M TV = N + t(N) diag(TV) = 1 cstar = CmatStarGpois(TV, theta.vec, lambda.vec, verbose = TRUE) cstar ``` **GenMVGpois** `GenMVGPois` (the engine function) is the most important function in this package (**RNGforGPD**). It depends on all the other functions in this package and three external packages: **mvtnorm**, **corpcor**, and **VGAM**. The major difference between the univariate generalized Poisson variables generating function and the multivariate one is the consideration of pairwise correlations between variables. These correlations can be verified using `ValidCorrGpois` and corrected by `CorrNNGpois`. ```{r} set.seed(3406) lambda.vec = c(-0.2, 0.2, -0.3) theta.vec = c(1, 3, 4) M = c(0.352, 0.265, 0.342) N = diag(3) N[lower.tri(N)] = M TV = N + t(N) diag(TV) = 1 cstar = CmatStarGpois(TV, theta.vec, lambda.vec, verbose = TRUE) sample.size = 10000; no.gpois = 3 data = GenMVGpois(sample.size, no.gpois, cstar, theta.vec, lambda.vec, details = FALSE) apply(data, 2, mean) # empirical means theta.vec / (1 - lambda.vec) # theoretical means apply(data, 2, var) # empirical variances theta.vec / (1 - lambda.vec)^3 # theoretical variances cor(data) # empirical correlation matrix TV # specified correlation matrix ``` # Citations Amatya, A. and Demirtas, H. (2015). Simultaneous generation of multivariate mixed data with Poisson and normal marginals. *Journal of Statistical Computation and Simulation*, **85(15)**, 3129-3139. Amatya, A. and Demirtas, H. (2017). PoisNor: An R package for generation of multivariate data with Poisson and normal marginals. *Communications in Statistics - Simulation and Computation*, **46(3)**, 2241-2253. Demirtas, H. (2017). On accurate and precise generation of generalized Poisson variates. *Communications in Statistics - Simulation and Computation*, **46(1)**, 489-499. Demirtas, H. and Hedeker, D. (2011). A practical way for computing approximate lower and upper correlation bounds. *The American Statistician*, **65(2)**, 104-109. Yahav, I. and Shmueli, G. (2012). On generating multivariate Poisson data in management science applications. *Applied Stochastic Models in Business and Industry*, **28(1)**, 91-102.