--- title: "Cache and downloads" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Cache and downloads} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` `datasusr` can cache DATASUS downloads in a local directory so that repeated calls do not hit the DATASUS FTP again. This is especially useful when developing analysis pipelines interactively. ## How caching works When you call `datasus_download()` with `use_cache = TRUE` (the default), files are stored in a structured subdirectory tree under the cache folder. On subsequent calls for the same files, the cached versions are reused without any network access. ```{r eval = FALSE} library(datasusr) downloads <- datasus_fetch( source = "SIHSUS", file_type = c("RD", "SP"), year = 2024, month = 1, uf = c("PE", "PB") ) ``` ## Configuring the cache directory By default, downloads are placed in a session-scoped subdirectory of `tempdir()` (which R cleans up automatically when the session ends), so the package never writes outside the user-controlled tempdir unless you opt in. The cache location is resolved in the following order: 1. The `cache_dir` function argument 2. The `DATASUSR_CACHE_DIR` environment variable 3. The `datasusr.cache_dir` R option 4. The session default (`file.path(tempdir(), "datasusr-cache")`) To enable a persistent cache that survives across sessions, point one of the above to a directory of your choice — for example `tools::R_user_dir("datasusr", "cache")` — and the cache becomes truly persistent. To set it globally, add a line to your `.Renviron`: ``` DATASUSR_CACHE_DIR=/path/to/my/cache ``` Or in R: ```{r eval = FALSE} options(datasusr.cache_dir = "/path/to/my/cache") ``` ## Inspecting the cache ```{r eval = FALSE} # Quick summary datasus_cache_info(verbose = TRUE) # Detailed listing of all cached files datasus_cache_list() ``` ## Forcing a re-download Pass `refresh = TRUE` to `datasus_download()` (or `datasus_fetch()`) to re-download files even when they exist in the cache: ```{r eval = FALSE} datasus_download(files, refresh = TRUE) ``` ## Pruning and clearing the cache Over time the cache can grow large. Two functions help manage its size: ```{r eval = FALSE} # Remove files older than 90 days datasus_cache_prune(older_than_days = 90) # Keep the total cache under 5 GB datasus_cache_prune(max_size_bytes = 5 * 1024^3) # Remove everything datasus_cache_clear() ``` When pruning by size, the least-recently-accessed files are removed first.