| Title: | Create Future 'EnergyPlus' Weather Files using 'CMIP6' Data |
|---|---|
| Description: | Query, download climate change projection data from the 'CMIP6' (Coupled Model Intercomparison Project Phase 6) project <https://pcmdi.llnl.gov/CMIP6/> in the 'ESGF' (Earth System Grid Federation) platform <https://esgf.llnl.gov>, and create future 'EnergyPlus' <https://energyplus.net> Weather ('EPW') files adjusted from climate changes using data from Global Climate Models ('GCM'). |
| Authors: | Hongyuan Jia [aut, cre] (ORCID: <https://orcid.org/0000-0002-0075-8183>), Adrian Chong [aut] (ORCID: <https://orcid.org/0000-0002-9486-4728>) |
| Maintainer: | Hongyuan Jia <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.4.9001 |
| Built: | 2026-07-01 17:53:58 UTC |
| Source: | https://github.com/ideas-lab-nus/epwshiftr |
Query, download climate change projection data from the CMIP6 (Coupled Model Intercomparison Project Phase 6) project in the ESGF (Earth System Grid Federation) platform, and create future EnergyPlus Weather (EPW) files adjusted from climate changes using data from Global Climate Models (GCM).
epwshiftr.verbose: If TRUE, more detailed message will be printed.
Default: FALSE.
epwshiftr.progress: If TRUE, progress bars are shown for long-running
operations that support them. Default: interactive().
epwshiftr.threshold_alpha: the threshold of the absolute value for alpha,
i.e. monthly-mean fractional change, when performing morphing operations.
The default value is set to 3. If the morphing methods are set
"stretch" or "combined", and the absolute alpha exceeds the threshold
value, warnings are issued and the morphing method fallbacks to
"shift" to avoid unrealistic morphed values.
epwshiftr.dir_store: The persistent store directory for query snapshots,
dictionaries, source mirrors, downloads, extraction results, outputs, and
the store manifest. If not set, tools::R_user_dir() with type "data"
will be used.
epwshiftr.cache: Controls caching behavior. TRUE enables normal
caching (default), FALSE disables caching entirely, and "offline"
enables offline mode where only cached data is used and no network
requests are made. Default: TRUE
epwshiftr.dir_cache: The directory for disposable cache entries. Deleting
this directory can require re-fetching or re-parsing data, but should not
invalidate a persistent store.
Hongyuan Jia
Useful links:
Report bugs at https://github.com/ideas-lab-nus/epwshiftr/issues
[ is a one-dimensional shortcut for x$slice(i).
## S3 method for class 'EsgResult' x[i, j, ..., drop = FALSE]## S3 method for class 'EsgResult' x[i, j, ..., drop = FALSE]
x |
An EsgResult object. |
i |
A row selector accepted by |
j, ..., drop
|
Unsupported. |
A new result object of the same type, or x for x[].
data_node_status() is the user-facing replacement for the legacy
data-node helper name used in earlier releases.
data_node_status( speed_test = FALSE, timeout = 3, index_node = INDEX_NODES[["ORNL"]] )data_node_status( speed_test = FALSE, timeout = 3, index_node = INDEX_NODES[["ORNL"]] )
speed_test |
If |
timeout |
Timeout for each HTTP probe in seconds. Default: |
index_node |
The index node to query for data-node status.
Default: |
A data.table::data.table() of 2 or 3 (when speed_test is TRUE)
columns:
| Column | Type | Description |
data_node |
character | Web address of data node |
status |
character | Status of data node. "UP" means OK and "DOWN" means currently not available |
probe_ms |
double | HTTP probe elapsed time in milliseconds for UP data nodes
|
## Not run: data_node_status() ## End(Not run)## Not run: data_node_status() ## End(Not run)
Downloader provides a general purpose file download system with:
File status management (missing, downloading, downloaded, verified)
Incremental checksum verification during download
Resume capability for interrupted downloads
Async and parallel download using mirai
Progress tracking
Error handling and retry logic
data_dirThe final data directory
tmp_dirThe temporary files directory
max_retriesMaximum number of retry attempts
timeoutDownload timeout in seconds
network_policyNetwork options passed to libcurl.
node_policyData-node cooldown and ranking policy.
transfer_policyCurl transfer policy.
resource_policyLocal resource and scheduling policy.
n_workersNumber of parallel workers
manifestPersistent download manifest path, or NULL.
configCurrent downloader configuration as a named list. When
manifest is set, the configuration is stored in the
manifest's download_config table.
new()
Create a new Downloader object
Downloader$new( dest = NULL, temp = NULL, retries = 3L, timeout = 3600L, ssl_verifypeer = TRUE, proxy = NULL, connect_timeout = NULL, useragent = NULL, cleanup = TRUE, n_workers = 4L, node_policy = NULL, transfer_policy = NULL, resource_policy = NULL, manifest = NULL )
destA string specifying the directory for final
downloaded files. If NULL, uses a temporary directory. Default: NULL.
tempA string specifying the directory for temporary files
(.part, .done). If NULL, uses dest/.tmp. Should ideally
be on the same filesystem as dest for atomic rename
operations. Default: NULL.
retriesA positive integer specifying the maximum number of
retry attempts for failed downloads. Default: 3L.
timeoutA positive integer specifying the timeout in seconds for
each download. Default: 3600L (1 hour).
ssl_verifypeerWhether to verify HTTPS certificates. Default:
TRUE.
proxyOptional proxy URL passed to libcurl. Default: NULL.
connect_timeoutOptional connection timeout in seconds passed
to libcurl. Default: NULL.
useragentOptional HTTP user agent passed to libcurl.
Default: NULL.
cleanupA logical value specifying whether to automatically clean up failed temporary
files. Default: TRUE.
n_workersA non-negative integer specifying the number of parallel workers
for async downloads. If 0, async downloads will fallback to synchronous mode.
Default: 4L.
node_policyA list controlling historical data-node cooldown and ranking. Missing fields use conservative defaults.
transfer_policyA list controlling curl transfer options and
optional experimental Range-piece downloads. Supported curl
fields are chunk_size, bandwidth_limit, low_speed_limit,
and low_speed_time. Range fields are range_mode
("off", "single", "multi", or "auto"), piece_size,
piece_concurrency, max_sources,
require_checksum_for_multisource, and range_probe_timeout.
The default range_mode = "off" keeps the existing streaming
download behavior.
resource_policyA list controlling local resource checks and
scheduling. Supported fields are host_concurrency,
disk_preflight, and min_free_space.
manifestOptional DuckDB manifest path for persistent
sessions, tasks, candidate URLs, and events. If NULL, only
the single-file shortcut API is available. Default: NULL.
An Downloader object.
\dontrun{
dl <- Downloader$new()
dl <- Downloader$new(dest = "~/data")
dl <- Downloader$new(
dest = "~/data",
temp = "~/data/.tmp",
n_workers = 8
)
}
download()
Download a single file with state management and resume support
Downloader$download( url, filename = NULL, subdir = NULL, progress = TRUE, overwrite = FALSE, checksum = NULL, checksum_type = "sha256", resume = TRUE, block = TRUE, .tmp_id = NULL )
urlA string specifying the URL to download from.
filenameA string specifying the filename for the downloaded file. If NULL, uses
filename from URL. Default: NULL.
subdirA string specifying the subdirectory within dest to save file.
Default: NULL (save directly in dest).
progressA logical value specifying whether to show progress bar. Default: TRUE.
overwriteA logical value specifying whether to overwrite existing file. Default: FALSE.
checksumA string specifying the expected checksum for verification. If provided,
enables incremental checksum calculation. Default: NULL.
checksum_typeA string specifying the checksum type ("sha256" or "md5"). Default: "sha256".
resumeA logical value specifying whether to resume interrupted downloads.
Default: TRUE.
blockA logical value specifying whether to block until download completes.
If FALSE, downloads asynchronously in background. Default: TRUE.
.tmp_idInternal temporary file ID used by persistent
download tasks. Default: NULL.
If block = TRUE, returns the path to the downloaded file.
If block = FALSE, returns a task ID for tracking the download.
\dontrun{
# Blocking download
path <- dl$download(url = "https://example.com/data.nc")
# Async download
task_id <- dl$download(
url = "https://example.com/data.nc",
block = FALSE
)
dl$wait_for_tasks(task_id)
# Multiple files (async batch)
urls <- c("https://example.com/file1.nc", "https://example.com/file2.nc")
task_ids <- sapply(urls, function(url) {
dl$download(url = url, block = FALSE)
})
results <- dl$wait_for_tasks(task_ids)
}
enqueue()
Add a download plan to the persistent manifest.
Downloader$enqueue(plan, session_label = NULL)
planA data frame with at least logical_file_id, filename,
and url columns.
session_labelOptional label for this download session.
The created session ID.
preflight()
Check local resource requirements before downloading.
Downloader$preflight( plan = NULL, session_id = NULL, task_id = NULL, overwrite = FALSE )
planOptional download plan. If supplied, preflight is calculated without writing to the persistent manifest.
session_idOptional persistent session ID.
task_idOptional persistent task ID vector.
overwriteWhether existing final files would be overwritten.
Default: FALSE.
A one-row data frame with byte and disk-space summary.
run()
Run queued persistent download tasks.
Downloader$run( session_id = NULL, task_id = NULL, block = TRUE, progress = TRUE, overwrite = FALSE, resume = TRUE )
session_idOptional session ID.
task_idOptional task ID vector.
blockWhether to block until completion. If FALSE, creates
a detached background job via $start().
progressWhether to show per-file progress.
overwriteWhether to overwrite existing final files.
resumeWhether to resume .part files.
If block = TRUE, a data frame of selected task records
after the run. If block = FALSE, a one-row background job
record.
start()
Start a persistent download session in the background.
Downloader$start(
session_id = NULL,
task_id = NULL,
overwrite = FALSE,
resume = TRUE,
mode = c("process", "daemon"),
store_path = NULL
)session_idOptional session ID.
task_idOptional task ID vector.
overwriteWhether to overwrite existing final files.
resumeWhether to resume .part files.
modeBackground execution mode. "process" starts a detached
Rscript; "daemon" submits the job to a running downloader
daemon.
store_pathOptional EsgStore path to sync after completion.
A one-row data frame describing the background job.
jobs()
List downloader background jobs.
Downloader$jobs(status = NULL)
statusOptional job status filter.
job_status()
Return downloader background job status.
Downloader$job_status(job_id = NULL)
job_idOptional job ID filter.
job_logs()
Return downloader background job log lines.
Downloader$job_logs(job_id, tail = 100L)
job_idJob ID.
tailNumber of trailing lines to return.
stop_job()
Request cancellation of a background downloader job.
Downloader$stop_job(job_id, force = FALSE)
job_idJob ID.
forceWhether to kill the recorded process immediately.
daemon_start()
Start a persistent downloader daemon.
Downloader$daemon_start(port = NULL, heartbeat_interval = 5)
portOptional localhost TCP port. If NULL, a random high
port is chosen.
heartbeat_intervalSeconds between daemon heartbeat checks.
A one-row data frame describing the daemon.
daemon_status()
Return downloader daemon status records.
Downloader$daemon_status()
daemon_stop()
Request the running downloader daemon to stop.
Downloader$daemon_stop(force = FALSE)
forceWhether to kill the daemon process immediately.
sessions()
List persistent download sessions.
Downloader$sessions()
tasks()
List persistent download tasks.
Downloader$tasks(session_id = NULL, job_id = NULL, status = NULL)
session_idOptional session ID.
job_idOptional background job ID.
statusOptional task status filter.
status()
Return persistent download task status.
Downloader$status(session_id = NULL, job_id = NULL, task_id = NULL)
session_idOptional session ID.
job_idOptional background job ID.
task_idOptional task ID vector.
A data frame of matching task records.
events()
Return persistent downloader event logs.
Downloader$events(session_id = NULL, job_id = NULL, task_id = NULL)
session_idOptional session ID.
job_idOptional background job ID.
task_idOptional task ID vector.
A data frame of event records.
on()
Register an in-session downloader event callback.
Downloader$on(event, fun)
eventEvent name.
funCallback function called with (event, downloader).
A callback token for $off().
off()
Remove a downloader event callback.
Downloader$off(token)
tokenCallback token returned by $on().
TRUE when a callback was removed.
data_nodes()
Return historical data node download performance.
Downloader$data_nodes(service = NULL)
serviceOptional ESGF service filter.
A data frame of data node performance records.
reset_data_nodes()
Reset historical data-node health records.
Downloader$reset_data_nodes(data_node = NULL, service = NULL)
data_nodeOptional data-node host to reset.
serviceOptional ESGF service filter.
The remaining data-node records.
record_probes()
Record URL probe outcomes from a download plan into node history.
Downloader$record_probes(plan, probed = TRUE)
planA download plan returned by $download_plan().
probedWhether probe = TRUE was used to create the plan.
The current data-node records.
retry()
Requeue failed or cancelled persistent tasks.
Downloader$retry(
session_id = NULL,
task_id = NULL,
status = c("error", "cancelled")
)session_idOptional session ID.
task_idOptional task ID vector.
statusTask statuses to requeue. Default:
c("error", "cancelled").
A data frame of requeued task records.
cancel()
Cancel queued or in-progress persistent download tasks.
Downloader$cancel(
session_id = NULL,
task_id = NULL,
status = c("queued", "downloading")
)session_idOptional session ID.
task_idOptional task ID vector.
statusTask statuses to cancel. Default:
c("queued", "downloading").
A data frame of cancelled task records.
resume()
Resume queued or interrupted persistent tasks.
Downloader$resume(session_id = NULL, task_id = NULL, ...)
session_idOptional session ID.
task_idOptional task ID vector.
...Additional arguments passed to $run().
A data frame of selected task records after the run.
verify()
Verify checksums for completed persistent tasks.
Downloader$verify(session_id = NULL, task_id = NULL)
session_idOptional session ID.
task_idOptional task ID vector.
A data frame of completed task records with a
checksum_ok column.
cleanup_tmp()
Clean up temporary files (.part and .done files)
Downloader$cleanup_tmp(all = FALSE)
allIf TRUE, removes all temporary files. If FALSE,
only removes orphaned files (no corresponding final file).
Default: FALSE.
Number of files removed.
\dontrun{
n_removed <- downloader$cleanup_tmp()
n_removed <- downloader$cleanup_tmp(all = TRUE)
}
get_tasks()
Get all async download tasks
Downloader$get_tasks()
A list of DownloadTask objects.
\dontrun{
tasks <- downloader$get_tasks()
}
get_task_status()
Get status of an async download task
Downloader$get_task_status(task_id)
task_idTask ID returned by download(block = FALSE).
A list with task information including status, progress, etc.
\dontrun{
task_id <- dl$download(url, block = FALSE)
status <- dl$get_task_status(task_id)
}
wait_for_tasks()
Wait for all async download tasks to complete
Downloader$wait_for_tasks(task_ids = NULL, progress = TRUE)
task_idsOptional vector of task IDs to wait for. If NULL,
waits for all tasks. Default: NULL.
progressWhether to show progress. Default: TRUE.
A list of completed task statuses.
\dontrun{
task1 <- dl$download(url1, block = FALSE)
task2 <- dl$download(url2, block = FALSE)
results <- dl$wait_for_tasks()
}
cancel_task()
Cancel an async download task
Downloader$cancel_task(task_id)
task_idTask ID returned by download(block = FALSE).
Logical TRUE if cancellation was successful, FALSE otherwise.
\dontrun{
task_id <- dl$download(url, block = FALSE)
# Cancel if needed
dl$cancel_task(task_id)
}
list_incomplete()
List incomplete downloads
Downloader$list_incomplete()
A data.frame with information about incomplete downloads.
\dontrun{
incomplete <- downloader$list_incomplete()
}
verify_checksum()
Verify file checksum
Downloader$verify_checksum(file, expected, type = "sha256")
filePath to file to verify.
expectedExpected checksum value.
typeChecksum type ("md5" or "sha256"). Default: "sha256".
TRUE if checksum matches, FALSE otherwise.
\dontrun{
valid <- downloader$verify_checksum("data.nc", "abc123", "sha256")
}
print()
Print downloader summary
Downloader$print()
The Downloader object itself, invisibly.
clone()
The objects of this class are cloneable with this method.
Downloader$clone(deep = FALSE)
deepWhether to make a deep clone.
Hongyuan Jia
## ------------------------------------------------ ## Method `Downloader$new` ## ------------------------------------------------ ## Not run: dl <- Downloader$new() dl <- Downloader$new(dest = "~/data") dl <- Downloader$new( dest = "~/data", temp = "~/data/.tmp", n_workers = 8 ) ## End(Not run) ## ------------------------------------------------ ## Method `Downloader$download` ## ------------------------------------------------ ## Not run: # Blocking download path <- dl$download(url = "https://example.com/data.nc") # Async download task_id <- dl$download( url = "https://example.com/data.nc", block = FALSE ) dl$wait_for_tasks(task_id) # Multiple files (async batch) urls <- c("https://example.com/file1.nc", "https://example.com/file2.nc") task_ids <- sapply(urls, function(url) { dl$download(url = url, block = FALSE) }) results <- dl$wait_for_tasks(task_ids) ## End(Not run) ## ------------------------------------------------ ## Method `Downloader$cleanup_tmp` ## ------------------------------------------------ ## Not run: n_removed <- downloader$cleanup_tmp() n_removed <- downloader$cleanup_tmp(all = TRUE) ## End(Not run) ## ------------------------------------------------ ## Method `Downloader$get_tasks` ## ------------------------------------------------ ## Not run: tasks <- downloader$get_tasks() ## End(Not run) ## ------------------------------------------------ ## Method `Downloader$get_task_status` ## ------------------------------------------------ ## Not run: task_id <- dl$download(url, block = FALSE) status <- dl$get_task_status(task_id) ## End(Not run) ## ------------------------------------------------ ## Method `Downloader$wait_for_tasks` ## ------------------------------------------------ ## Not run: task1 <- dl$download(url1, block = FALSE) task2 <- dl$download(url2, block = FALSE) results <- dl$wait_for_tasks() ## End(Not run) ## ------------------------------------------------ ## Method `Downloader$cancel_task` ## ------------------------------------------------ ## Not run: task_id <- dl$download(url, block = FALSE) # Cancel if needed dl$cancel_task(task_id) ## End(Not run) ## ------------------------------------------------ ## Method `Downloader$list_incomplete` ## ------------------------------------------------ ## Not run: incomplete <- downloader$list_incomplete() ## End(Not run) ## ------------------------------------------------ ## Method `Downloader$verify_checksum` ## ------------------------------------------------ ## Not run: valid <- downloader$verify_checksum("data.nc", "abc123", "sha256") ## End(Not run)## ------------------------------------------------ ## Method `Downloader$new` ## ------------------------------------------------ ## Not run: dl <- Downloader$new() dl <- Downloader$new(dest = "~/data") dl <- Downloader$new( dest = "~/data", temp = "~/data/.tmp", n_workers = 8 ) ## End(Not run) ## ------------------------------------------------ ## Method `Downloader$download` ## ------------------------------------------------ ## Not run: # Blocking download path <- dl$download(url = "https://example.com/data.nc") # Async download task_id <- dl$download( url = "https://example.com/data.nc", block = FALSE ) dl$wait_for_tasks(task_id) # Multiple files (async batch) urls <- c("https://example.com/file1.nc", "https://example.com/file2.nc") task_ids <- sapply(urls, function(url) { dl$download(url = url, block = FALSE) }) results <- dl$wait_for_tasks(task_ids) ## End(Not run) ## ------------------------------------------------ ## Method `Downloader$cleanup_tmp` ## ------------------------------------------------ ## Not run: n_removed <- downloader$cleanup_tmp() n_removed <- downloader$cleanup_tmp(all = TRUE) ## End(Not run) ## ------------------------------------------------ ## Method `Downloader$get_tasks` ## ------------------------------------------------ ## Not run: tasks <- downloader$get_tasks() ## End(Not run) ## ------------------------------------------------ ## Method `Downloader$get_task_status` ## ------------------------------------------------ ## Not run: task_id <- dl$download(url, block = FALSE) status <- dl$get_task_status(task_id) ## End(Not run) ## ------------------------------------------------ ## Method `Downloader$wait_for_tasks` ## ------------------------------------------------ ## Not run: task1 <- dl$download(url1, block = FALSE) task2 <- dl$download(url2, block = FALSE) results <- dl$wait_for_tasks() ## End(Not run) ## ------------------------------------------------ ## Method `Downloader$cancel_task` ## ------------------------------------------------ ## Not run: task_id <- dl$download(url, block = FALSE) # Cancel if needed dl$cancel_task(task_id) ## End(Not run) ## ------------------------------------------------ ## Method `Downloader$list_incomplete` ## ------------------------------------------------ ## Not run: incomplete <- downloader$list_incomplete() ## End(Not run) ## ------------------------------------------------ ## Method `Downloader$verify_checksum` ## ------------------------------------------------ ## Not run: valid <- downloader$verify_checksum("data.nc", "abc123", "sha256") ## End(Not run)
Get an EPW morphing backend
epw_morph_backend(name = "belcher")epw_morph_backend(name = "belcher")
name |
Backend name. |
An EpwMorphBackend object.
EPW morphing backends
epw_morph_backends()epw_morph_backends()
A character vector of registered backend names.
EPW morphing periods
epw_morph_periods(...)epw_morph_periods(...)
... |
Named integer year vectors. |
A data.table with columns period and year.
EPW morphing recipe
epw_morph_recipe(name = "belcher", backend = name, methods = NULL)epw_morph_recipe(name = "belcher", backend = name, methods = NULL)
name |
Recipe name. Defaults to |
backend |
Backend name. Defaults to |
methods |
Optional named character vector overriding morphing methods for backend steps. |
A recipe list.
Register an EPW morphing backend
epw_morph_register_backend(name, backend, overwrite = FALSE)epw_morph_register_backend(name, backend, overwrite = FALSE)
name |
Backend name. |
backend |
An EpwMorphBackend object. |
overwrite |
Whether to replace an existing backend. |
The backend object, invisibly.
Backend runner functions return epw_morph_result objects. Use
epw_morph_result() in custom backends after producing complete hourly EPW
weather data.
epw_morph_result( context, epw = context$epw, data, parts = list(), diagnostics = morpher__empty_diagnostics(), factors = NULL )epw_morph_result( context, epw = context$epw, data, parts = list(), diagnostics = morpher__empty_diagnostics(), factors = NULL )
context |
Canonical EPW morphing context supplied to the backend runner. |
epw |
EPW object associated with the result. |
data |
Complete hourly EPW weather data ready for Parquet output or EPW writing. |
parts |
Optional named list of intermediate backend result tables. |
diagnostics |
Optional backend diagnostic rows. |
factors |
Optional backend factor rows. |
An epw_morph_result object.
EPW morphing variable sets
epw_morph_variables(level = c("recommended", "minimal", "extended"))epw_morph_variables(level = c("recommended", "minimal", "extended"))
level |
Variable set level, an EpwMorphBackend object, or an
|
A character vector of CMIP variable IDs.
Create an EPW morpher
epw_morpher( store, epw, site_id = NULL, recipe = epw_morph_recipe("belcher"), label = NULL )epw_morpher( store, epw, site_id = NULL, recipe = epw_morph_recipe("belcher"), label = NULL )
store |
An EsgStore object. |
epw |
EPW path or an eplusr::Epw object. |
site_id |
Optional site identifier. |
recipe |
EPW morphing recipe. |
label |
Optional source label. |
An EpwMorpher object.
EpwMorphBackend defines a statistical downscaling backend that can be
selected by epw_morph_recipe() and executed by EpwMorpher.
nameBackend name.
labelHuman-readable backend label.
requires_referenceWhether the backend requires reference climate data.
new()
Create an EPW morphing backend.
EpwMorphBackend$new( name, label = NULL, methods = NULL, method_choices = NULL, rules, requires_reference = FALSE, runner )
nameBackend name.
labelHuman-readable backend label.
methodsNamed default method vector.
method_choicesAllowed method values.
rulesBackend rule table.
requires_referenceWhether reference climate data are required.
runnerFunction taking (context, backend) and returning an
epw_morph_result.
methods()
Return default backend methods.
EpwMorphBackend$methods()
method_choices()
Return allowed backend method values.
EpwMorphBackend$method_choices()
rules()
Return backend rules.
EpwMorphBackend$rules()
required_variables()
Return required CMIP variable IDs.
EpwMorphBackend$required_variables()
validate_methods()
Validate and complete method overrides.
EpwMorphBackend$validate_methods(methods = NULL)
methodsOptional named method override vector.
rules_with_methods()
Return backend rules with methods applied.
EpwMorphBackend$rules_with_methods(methods = NULL)
methodsOptional named method override vector.
run()
Run this backend on a canonical EPW morphing context.
EpwMorphBackend$run(context)
contextCanonical EPW morphing context.
clone()
The objects of this class are cloneable with this method.
EpwMorphBackend$clone(deep = FALSE)
deepWhether to make a deep clone.
EpwMorpher consumes completed EsgStore extraction outputs and creates
future EPW files through a store-backed morphing workflow.
new()
Create an EPW morpher.
EpwMorpher$new(
store,
epw,
site_id = NULL,
recipe = epw_morph_recipe("belcher"),
label = NULL
)storeAn EsgStore object.
epwEPW path or an eplusr::Epw object.
site_idOptional site identifier.
recipeEPW morphing recipe.
labelOptional source label.
required_variables()
Return recipe-required CMIP variable IDs.
EpwMorpher$required_variables()
preflight()
Preflight EPW morphing inputs without writing store state.
EpwMorpher$preflight(
plan_id = NULL,
periods = NULL,
reference_plan_id = NULL,
reference_periods = NULL,
summary_id = NULL,
reference_summary_id = NULL,
baseline_id = NULL,
by = c("source_id", "experiment_id", "variant_label", "period"),
strict = TRUE
)plan_idOptional extraction plan IDs.
periodsOptional period table from epw_morph_periods().
reference_plan_idOptional reference extraction plan IDs for change-factor backends.
reference_periodsOptional reference period table from
epw_morph_periods().
summary_idOptional climate summary ID.
reference_summary_idOptional reference climate summary ID for change-factor backends.
baseline_idOptional baseline summary ID.
byClimate grouping columns.
strictWhether required-data issues are errors.
summarise_climate()
Summarise extracted climate data by period and month.
EpwMorpher$summarise_climate( plan_id, periods, strict = TRUE, overwrite = FALSE )
plan_idExtraction plan IDs.
periodsPeriod table from epw_morph_periods().
strictWhether incomplete extraction coverage is an error.
overwriteWhether to replace existing rows for this summary.
summarise_baseline()
Summarise baseline EPW weather by month.
EpwMorpher$summarise_baseline(overwrite = FALSE)
overwriteWhether to replace existing rows.
plan()
Create a morphing plan and monthly factors.
EpwMorpher$plan(
summary_id,
reference_summary_id = NULL,
baseline_id = NULL,
by = c("source_id", "experiment_id", "variant_label", "period"),
strict = TRUE,
overwrite = FALSE
)summary_idClimate summary ID.
reference_summary_idOptional reference climate summary ID for change-factor backends.
baseline_idBaseline summary ID. If NULL, baseline summary is created.
byClimate grouping columns.
strictWhether missing required variables are blocking errors.
overwriteWhether to replace an existing plan.
preview_plan()
Preview a morphing plan and monthly factors without writing store state.
EpwMorpher$preview_plan(
summary_id,
reference_summary_id = NULL,
baseline_id = NULL,
by = c("source_id", "experiment_id", "variant_label", "period"),
strict = TRUE
)summary_idClimate summary ID.
reference_summary_idOptional reference climate summary ID for change-factor backends.
baseline_idBaseline summary ID. If NULL, baseline summary is created.
byClimate grouping columns.
strictWhether missing required variables are blocking errors.
diagnose()
Diagnose a morphing plan.
EpwMorpher$diagnose(morph_id)
morph_idMorphing plan ID.
check()
Abort if a morphing plan has blocking diagnostics.
EpwMorpher$check(morph_id)
morph_idMorphing plan ID.
run()
Execute a morphing plan and write hourly result Parquet files.
EpwMorpher$run(morph_id, overwrite = FALSE, resume = TRUE)
morph_idMorphing plan ID.
overwriteWhether to overwrite existing result files.
resumeWhether to reuse complete existing results.
write_epw()
Write future EPW files from morphing results.
EpwMorpher$write_epw( morph_id, dir, separate = TRUE, overwrite = FALSE, resume = TRUE )
morph_idMorphing plan ID.
dirOutput directory. Relative paths are resolved under the store
root. If NULL, the workflow stops after writing morph result
Parquet files and does not write EPW outputs.
separateWhether to create case subdirectories.
overwriteWhether to overwrite existing EPW files.
resumeWhether to reuse complete existing EPW outputs.
workflow()
Run the store-native EPW morphing workflow.
EpwMorpher$workflow(
plan_id,
periods,
reference_plan_id = NULL,
reference_periods = NULL,
by = c("source_id", "experiment_id", "variant_label", "period"),
strict = TRUE,
dir = "outputs/future-epw",
separate = TRUE,
overwrite = FALSE,
resume = TRUE
)plan_idExtraction plan IDs.
periodsPeriod table from epw_morph_periods().
reference_plan_idOptional reference extraction plan IDs for change-factor backends.
reference_periodsOptional reference period table from
epw_morph_periods().
byClimate grouping columns.
strictWhether blocking diagnostics should abort the workflow.
dirOutput directory. Relative paths are resolved under the store root.
separateWhether to create case subdirectories.
overwriteWhether to overwrite existing plan, result, and EPW outputs.
resumeWhether to reuse complete existing result and EPW outputs.
status()
Return morphing plan status rows.
EpwMorpher$status(morph_id = NULL)
morph_idOptional morphing plan IDs.
outputs()
Return future EPW output rows.
EpwMorpher$outputs(morph_id = NULL)
morph_idOptional morphing plan IDs.
clone()
The objects of this class are cloneable with this method.
EpwMorpher$clone(deep = FALSE)
deepWhether to make a deep clone.
Hongyuan Jia
epwshiftr_cli() is the package-level entry point used by the optional
epwshiftr launcher. It exposes a small ESGF store management interface and
returns status metadata when exit = FALSE, which makes it testable from R.
epwshiftr_cli(args = commandArgs(trailingOnly = TRUE), exit = FALSE)epwshiftr_cli(args = commandArgs(trailingOnly = TRUE), exit = FALSE)
args |
Command line arguments. Defaults to
|
exit |
Whether to terminate the current R process with the command
status. Default: |
Invisibly, a list with status, result, and error.
esg_result() creates an empty query result object of input type, so that
you can load the saved JSON file via EsgResult$load().
esg_result(type = c("dataset", "file", "aggregation"))esg_result(type = c("dataset", "file", "aggregation"))
type |
A string indicating what type of ESGF query result should be
created. Should be one of |
An empty EsgResult object of given type.
EsgDataset provides a unified interface for accessing NetCDF data
remotely via OPeNDAP protocol. It wraps RNetCDF functions and provides
convenient methods for subsetting, slicing, and reading data without
downloading entire files.
The class supports three levels of interfaces:
Basic layer: Direct wrappers around RNetCDF functions
Middle layer: Convenient methods for subsetting by time/space
High layer: Data manipulation and format conversion
It also supports aggregating multiple files into a single logical dataset, automatically handling time dimension concatenation.
urlThe OPeNDAP URL(s)
is_openWhether the connection is open
is_aggregatedWhether the dataset contains multiple files
file_countNumber of files in the dataset
time_filterA result-level time filter recorded by
EsgResultFile$filter_time() or
EsgResultAggregation$filter_time(), or NULL.
new()
Create a new EsgDataset object
EsgDataset$new(urls)
urlsA character vector of OPeNDAP URLs. Can be a single URL or multiple URLs for a multi-file dataset.
An EsgDataset object.
\dontrun{
# Single file
ds <- EsgDataset$new("https://example.com/data.nc")
# Multiple files
ds <- EsgDataset$new(c("url1.nc", "url2.nc"))
}
open()
Open OPeNDAP connection(s)
EsgDataset$open(
async = FALSE,
timeout = NULL,
progress = getOption("epwshiftr.progress", interactive())
)asyncIf TRUE, first validates opening in a one-shot worker,
then re-opens caller-owned handles before returning so the
dataset remains opened after open() returns. The caller still
receives the final EsgDataset object itself rather than a
Mirai/Future-like handle. Default: FALSE.
timeoutOptional positive number of seconds for the async
worker pre-open phase. Only supported when async = TRUE.
It does not limit the final caller-owned reopen that makes the
returned dataset stay opened.
progressWhether to show a progress bar while opening
NetCDF/OPeNDAP handles. By default the package option
epwshiftr.progress is used, falling back to interactive().
The EsgDataset object itself, invisibly.
\dontrun{
ds$open()
# Returns the opened dataset directly; no Mirai/Future to collect.
ds$open(async = TRUE, timeout = 10)
}
close()
Close OPeNDAP connection(s)
EsgDataset$close()
The EsgDataset object itself, invisibly.
\dontrun{
ds$close()
}
slice()
Select files from this dataset by file position.
$slice() creates a new EsgDataset with a subset of the current
dataset URLs. This is a file/URL-level operation; NetCDF variable,
dimension, time, and spatial slicing is still performed by
$var_get(), $read_array(), $read_data_table(), and
$read_region().
EsgDataset$slice(i, reopen = FALSE)
iA positive or negative integer vector, or a logical vector with one value per file.
reopenWhether to open the returned dataset when the current
dataset is already open. If FALSE (default), slicing an
open dataset raises an error because RNetCDF handles cannot
be safely shared between dataset objects.
A new EsgDataset object.
reachable()
Probe whether this dataset's current files or URLs are reachable.
$reachable() checks the actual URLs or local paths stored in the
dataset. It does not reuse reachability checks from an EsgResult;
opened datasets, fallback downloads, and manually created datasets
are always evaluated from their current url values.
EsgDataset$reachable(level = c("data_node", "url"), probe = NULL)levelProbe level. "data_node" probes the root URL of each
remote data node; "url" probes the actual dataset URL.
Default: "data_node".
probeOptional named list of probe settings. Supported fields
are timeout, concurrency, network_policy,
cache_seconds, and cache_failures_seconds.
A data.table with columns
file_index, source_index, data_node, service, url,
reachable, latency_ms, error, probe_level,
probe_url, and probe_cached.
file_inq()
Get file information
EsgDataset$file_inq(index = 1L)
indexFile index for multi-file datasets. Default: 1L.
A list with file information.
\dontrun{
info <- ds$file_inq()
}
var_inq()
Get variable information
EsgDataset$var_inq(var, index = 1L)
varVariable name or ID.
indexFile index for multi-file datasets. Default: 1L.
A list with variable information.
\dontrun{
var_info <- ds$var_inq("tas")
}
dim_inq()
Get dimension information
EsgDataset$dim_inq(dim, index = 1L)
dimDimension name or ID.
indexFile index for multi-file datasets. Default: 1L.
A list with dimension information.
\dontrun{
dim_info <- ds$dim_inq("time")
}
att_get()
Get attribute value
EsgDataset$att_get(var, att, index = 1L)
varVariable name or ID, or "NC_GLOBAL" for global attributes.
attAttribute name.
indexFile index for multi-file datasets. Default: 1L.
The attribute value.
\dontrun{
units <- ds$att_get("tas", "units")
}
var_get()
Read variable data
EsgDataset$var_get( var, start = NULL, count = NULL, index = 1L, collapse = FALSE, async = FALSE, timeout = NULL )
varVariable name or ID.
startStarting indices (1-based). If NULL, starts from beginning.
countNumber of values to read. If NULL, reads all.
indexFile index for multi-file datasets. Default: 1L.
collapseWhether to collapse result. Default: FALSE.
asyncIf TRUE, perform the variable read in a one-shot
worker and return the final array directly once complete.
No Mirai/Future object is exposed. Default: FALSE.
timeoutOptional positive number of seconds for the async
worker phase. Only supported when async = TRUE.
An array with variable data.
\dontrun{
data <- ds$var_get("tas")
data_subset <- ds$var_get("tas", start = c(1, 1, 1), count = c(10, 10, 1))
# Returns the final array directly; no Mirai/Future handling required.
data_async <- ds$var_get("tas", async = TRUE, timeout = 10)
}
get_variables()
List all variables in the dataset
EsgDataset$get_variables(index = 1L)
indexFile index for multi-file datasets. Default: 1L.
A character vector of variable names.
\dontrun{
vars <- ds$get_variables()
}
get_dimensions()
List all dimensions in the dataset
EsgDataset$get_dimensions(index = 1L)
indexFile index for multi-file datasets. Default: 1L.
A character vector of dimension names.
\dontrun{
dims <- ds$get_dimensions()
}
get_time_axis()
Get time axis information
EsgDataset$get_time_axis(index = 1L)
indexFile index for multi-file datasets. Default: 1L.
A list containing time values, units, and calendar.
\dontrun{
time_info <- ds$get_time_axis()
}
get_spatial_grid()
Get spatial grid information (latitude and longitude)
EsgDataset$get_spatial_grid(index = 1L)
indexFile index for multi-file datasets. Default: 1L.
A list containing latitude and longitude values.
\dontrun{
grid <- ds$get_spatial_grid()
}
read_array()
Read variable data as a list of arrays (one per file)
EsgDataset$read_array( variable, start = NULL, count = NULL, collapse = FALSE, async = FALSE, timeout = NULL )
variableVariable name.
startStarting indices. If NULL, starts from beginning.
countNumber of values to read. If NULL, reads all.
collapseWhether to collapse result. Default: FALSE.
asyncIf TRUE, read array values in a one-shot worker and
return the final list directly once complete. No
Mirai/Future object is exposed. Default: FALSE.
timeoutOptional positive number of seconds for the async
worker phase. Only supported when async = TRUE.
A list of arrays with variable data. Each element corresponds to a file in the dataset.
\dontrun{
data_list <- ds$read_array("tas")
data <- data_list[[1]]
# Returns the final list directly; no Mirai/Future handling required.
data_list_async <- ds$read_array("tas", async = TRUE, timeout = 10)
}
read_data_table()
Read variable data as a list of data.table (one per file)
EsgDataset$read_data_table( variable, start = NULL, count = NULL, rbind = FALSE, async = FALSE, timeout = NULL )
variableVariable name.
startStarting indices. If NULL, starts from beginning.
countNumber of values to read. If NULL, reads all.
rbindIf TRUE, return a single data.table by row-binding
the per-file results with data.table::rbindlist(..., idcol = "file_index").
Default: FALSE.
asyncIf TRUE, offload the array read phase to a one-shot
worker and still return the final data.table result directly.
No Mirai/Future object is exposed. Default: FALSE.
timeoutOptional positive number of seconds for the async
worker phase. Only supported when async = TRUE.
If rbind = FALSE, a list of data.table (one per file).
If rbind = TRUE, a single data.table with an extra file_index column.
\dontrun{
dt_list <- ds$read_data_table("tas")
dt <- dt_list[[1]]
dt_all <- ds$read_data_table("tas", rbind = TRUE)
# Returns the final data.table directly; no Mirai/Future handling required.
dt_async <- ds$read_data_table("tas", async = TRUE, timeout = 10)
}
read_region()
Read variable values near a target coordinate and optional time range
EsgDataset$read_region( variable, lon, lat, time = "auto", method = "nearest", rbind = TRUE, async = FALSE, timeout = NULL )
variableCharacter vector of variable names.
lonTarget longitude.
latTarget latitude.
timeTime range to read. Use "auto" to reuse the time
range recorded by EsgResultFile$filter_time() or
EsgResultAggregation$filter_time() when available; if no
recorded range exists, all times are read. Use NULL to
always read the full time axis. A length-2 character,
Date, or POSIXt range is parsed in UTC and used
explicitly. Default: "auto".
methodGrid extraction method. One of "nearest", "idw",
"bilinear", or "mean". Default: "nearest".
rbindIf TRUE, return one data.table. If FALSE, return a
list of per-file, per-variable data.tables. Default: TRUE.
asyncIf TRUE, offload each NetCDF variable read to a
one-shot worker. Default: FALSE.
timeoutOptional positive number of seconds for each async
read. Only supported when async = TRUE.
A data.table or list of data.tables with columns including
file_index, variable, time, lon, lat, method, and
value. The "grid_sources" attribute records contributing grid
coordinates and weights.
\dontrun{
dt <- ds$read_region(
variable = c("tas", "hurs"),
lon = 103.98,
lat = 1.37,
time = c("2050-01-01", "2050-12-31")
)
}
selection()
Return file selection provenance for this dataset.
$selection() maps the current dataset file positions back to the
result rows that produced the dataset when that information is
available. It does not record intermediate filter steps.
EsgDataset$selection()
A list with source_count, source_num_found, and
source_indices.
print()
Print dataset summary
EsgDataset$print()
The EsgDataset object itself, invisibly.
clone()
The objects of this class are cloneable with this method.
EsgDataset$clone(deep = FALSE)
deepWhether to make a deep clone.
Hongyuan Jia
## ------------------------------------------------ ## Method `EsgDataset$new` ## ------------------------------------------------ ## Not run: # Single file ds <- EsgDataset$new("https://example.com/data.nc") # Multiple files ds <- EsgDataset$new(c("url1.nc", "url2.nc")) ## End(Not run) ## ------------------------------------------------ ## Method `EsgDataset$open` ## ------------------------------------------------ ## Not run: ds$open() # Returns the opened dataset directly; no Mirai/Future to collect. ds$open(async = TRUE, timeout = 10) ## End(Not run) ## ------------------------------------------------ ## Method `EsgDataset$close` ## ------------------------------------------------ ## Not run: ds$close() ## End(Not run) ## ------------------------------------------------ ## Method `EsgDataset$file_inq` ## ------------------------------------------------ ## Not run: info <- ds$file_inq() ## End(Not run) ## ------------------------------------------------ ## Method `EsgDataset$var_inq` ## ------------------------------------------------ ## Not run: var_info <- ds$var_inq("tas") ## End(Not run) ## ------------------------------------------------ ## Method `EsgDataset$dim_inq` ## ------------------------------------------------ ## Not run: dim_info <- ds$dim_inq("time") ## End(Not run) ## ------------------------------------------------ ## Method `EsgDataset$att_get` ## ------------------------------------------------ ## Not run: units <- ds$att_get("tas", "units") ## End(Not run) ## ------------------------------------------------ ## Method `EsgDataset$var_get` ## ------------------------------------------------ ## Not run: data <- ds$var_get("tas") data_subset <- ds$var_get("tas", start = c(1, 1, 1), count = c(10, 10, 1)) # Returns the final array directly; no Mirai/Future handling required. data_async <- ds$var_get("tas", async = TRUE, timeout = 10) ## End(Not run) ## ------------------------------------------------ ## Method `EsgDataset$get_variables` ## ------------------------------------------------ ## Not run: vars <- ds$get_variables() ## End(Not run) ## ------------------------------------------------ ## Method `EsgDataset$get_dimensions` ## ------------------------------------------------ ## Not run: dims <- ds$get_dimensions() ## End(Not run) ## ------------------------------------------------ ## Method `EsgDataset$get_time_axis` ## ------------------------------------------------ ## Not run: time_info <- ds$get_time_axis() ## End(Not run) ## ------------------------------------------------ ## Method `EsgDataset$get_spatial_grid` ## ------------------------------------------------ ## Not run: grid <- ds$get_spatial_grid() ## End(Not run) ## ------------------------------------------------ ## Method `EsgDataset$read_array` ## ------------------------------------------------ ## Not run: data_list <- ds$read_array("tas") data <- data_list[[1]] # Returns the final list directly; no Mirai/Future handling required. data_list_async <- ds$read_array("tas", async = TRUE, timeout = 10) ## End(Not run) ## ------------------------------------------------ ## Method `EsgDataset$read_data_table` ## ------------------------------------------------ ## Not run: dt_list <- ds$read_data_table("tas") dt <- dt_list[[1]] dt_all <- ds$read_data_table("tas", rbind = TRUE) # Returns the final data.table directly; no Mirai/Future handling required. dt_async <- ds$read_data_table("tas", async = TRUE, timeout = 10) ## End(Not run) ## ------------------------------------------------ ## Method `EsgDataset$read_region` ## ------------------------------------------------ ## Not run: dt <- ds$read_region( variable = c("tas", "hurs"), lon = 103.98, lat = 1.37, time = c("2050-01-01", "2050-12-31") ) ## End(Not run)## ------------------------------------------------ ## Method `EsgDataset$new` ## ------------------------------------------------ ## Not run: # Single file ds <- EsgDataset$new("https://example.com/data.nc") # Multiple files ds <- EsgDataset$new(c("url1.nc", "url2.nc")) ## End(Not run) ## ------------------------------------------------ ## Method `EsgDataset$open` ## ------------------------------------------------ ## Not run: ds$open() # Returns the opened dataset directly; no Mirai/Future to collect. ds$open(async = TRUE, timeout = 10) ## End(Not run) ## ------------------------------------------------ ## Method `EsgDataset$close` ## ------------------------------------------------ ## Not run: ds$close() ## End(Not run) ## ------------------------------------------------ ## Method `EsgDataset$file_inq` ## ------------------------------------------------ ## Not run: info <- ds$file_inq() ## End(Not run) ## ------------------------------------------------ ## Method `EsgDataset$var_inq` ## ------------------------------------------------ ## Not run: var_info <- ds$var_inq("tas") ## End(Not run) ## ------------------------------------------------ ## Method `EsgDataset$dim_inq` ## ------------------------------------------------ ## Not run: dim_info <- ds$dim_inq("time") ## End(Not run) ## ------------------------------------------------ ## Method `EsgDataset$att_get` ## ------------------------------------------------ ## Not run: units <- ds$att_get("tas", "units") ## End(Not run) ## ------------------------------------------------ ## Method `EsgDataset$var_get` ## ------------------------------------------------ ## Not run: data <- ds$var_get("tas") data_subset <- ds$var_get("tas", start = c(1, 1, 1), count = c(10, 10, 1)) # Returns the final array directly; no Mirai/Future handling required. data_async <- ds$var_get("tas", async = TRUE, timeout = 10) ## End(Not run) ## ------------------------------------------------ ## Method `EsgDataset$get_variables` ## ------------------------------------------------ ## Not run: vars <- ds$get_variables() ## End(Not run) ## ------------------------------------------------ ## Method `EsgDataset$get_dimensions` ## ------------------------------------------------ ## Not run: dims <- ds$get_dimensions() ## End(Not run) ## ------------------------------------------------ ## Method `EsgDataset$get_time_axis` ## ------------------------------------------------ ## Not run: time_info <- ds$get_time_axis() ## End(Not run) ## ------------------------------------------------ ## Method `EsgDataset$get_spatial_grid` ## ------------------------------------------------ ## Not run: grid <- ds$get_spatial_grid() ## End(Not run) ## ------------------------------------------------ ## Method `EsgDataset$read_array` ## ------------------------------------------------ ## Not run: data_list <- ds$read_array("tas") data <- data_list[[1]] # Returns the final list directly; no Mirai/Future handling required. data_list_async <- ds$read_array("tas", async = TRUE, timeout = 10) ## End(Not run) ## ------------------------------------------------ ## Method `EsgDataset$read_data_table` ## ------------------------------------------------ ## Not run: dt_list <- ds$read_data_table("tas") dt <- dt_list[[1]] dt_all <- ds$read_data_table("tas", rbind = TRUE) # Returns the final data.table directly; no Mirai/Future handling required. dt_async <- ds$read_data_table("tas", async = TRUE, timeout = 10) ## End(Not run) ## ------------------------------------------------ ## Method `EsgDataset$read_region` ## ------------------------------------------------ ## Not run: dt <- ds$read_region( variable = c("tas", "hurs"), lon = 103.98, lat = 1.37, time = c("2050-01-01", "2050-12-31") ) ## End(Not run)
EsgDict is an R6 class for project-specific ESG
controlled vocabulary data. It stores vocabulary tables, optional request
tables, normalized query indices, and source metadata used by local option
discovery and legality checks.
esgdict() is a small constructor around EsgDict$new().
esgdict(project = "CMIP6") esgdict_set_default(dict) esgdict_get_default(project = "CMIP6")esgdict(project = "CMIP6") esgdict_set_default(dict) esgdict_get_default(project = "CMIP6")
project |
ESG project identifier, such as |
dict |
An EsgDict object used as the package-level default dictionary for its project. |
esgdict() returns a new EsgDict object.
esgdict_set_default() returns dict, invisibly.
esgdict_get_default() returns the current package-level default
dictionary for project, or NULL.
The dictionary currently supports "CMIP6", "CMIP6PLUS", "INPUT4MIP",
"OBS4REF", "CORDEX-CMIP6", "CMIP7", and "EMD". CMIP6 dictionaries
include both controlled vocabularies and CMOR request-table data. Other
projects use vocabulary data only until a project-specific request source is
registered.
Building a dictionary may download upstream vocabulary/request sources when
the parsed dictionary cache and raw source cache are missing. Most examples
load a small installed CMIP6 example dictionary and run without network
access. The example that calls $build() is wrapped in \dontrun{} so
package checks do not depend on GitHub or upstream CV availability.
new()
Create a new ESG project dictionary.
The new dictionary is empty. Use
$build() to fetch and parse
upstream sources, or $load() to
restore a saved dictionary JSON file.
EsgDict$new(project = "CMIP6")
projectESG project identifier, such as "CMIP6" or
"CMIP6PLUS".
An EsgDict object.
dict <- EsgDict$new(project = "CMIP6") dict$status()
project()
Return the normalized ESG project identifier.
EsgDict$project()
A single string.
dict <- EsgDict$new(project = "CMIP6PLUS") dict$project()
profile()
Return the internal dictionary profile.
The profile determines how project-specific vocabulary sources are parsed and normalized.
EsgDict$profile()
A single string.
dict <- EsgDict$new(project = "CMIP6") dict$profile()
version()
Return vocabulary and request-source versions.
Empty or partially loaded dictionaries return NULL.
EsgDict$version()
A named list with vocab and request elements, or NULL.
path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr")
dict <- EsgDict$new(project = "CMIP6")
suppressMessages(dict$load(path))
dict$version()
sources()
Return upstream source metadata.
Source metadata records repository, tag/ref, commit, and local source directory information for the data used to build the dictionary.
EsgDict$sources()
A named list, or NULL for an empty dictionary.
path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr")
dict <- EsgDict$new(project = "CMIP6")
suppressMessages(dict$load(path))
dict$sources()
timestamp()
Return source vocabulary timestamps.
Timestamps are extracted from source vocabulary metadata when available.
EsgDict$timestamp()
A named list of timestamps, or NULL.
path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr")
dict <- EsgDict$new(project = "CMIP6")
suppressMessages(dict$load(path))
dict$timestamp()
built_time()
Return the time when this dictionary was built.
EsgDict$built_time()
A POSIXct value, or NULL.
path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr")
dict <- EsgDict$new(project = "CMIP6")
suppressMessages(dict$load(path))
dict$built_time()
status()
Return the dictionary lifecycle status.
Status values are:
"empty": no vocabulary/request payload is loaded.
"partial": some required payload is missing.
"built": the dictionary was built in this R session.
"loaded": the dictionary was restored from disk.
EsgDict$status()
A single string.
dict <- EsgDict$new(project = "CMIP6") dict$status()
has_data()
Check whether the dictionary contains usable data.
A dictionary has usable data after a complete
$build() or
$load().
EsgDict$has_data()
TRUE or FALSE.
dict <- EsgDict$new(project = "CMIP6") dict$has_data()
is_empty()
Check whether the dictionary is empty.
EsgDict$is_empty()
TRUE or FALSE.
dict <- EsgDict$new(project = "CMIP6") dict$is_empty()
build()
Build the dictionary from upstream source data.
$build() resolves the configured project vocabulary source, downloads
or reuses raw source files as needed, parses them into normalized
tables, and builds query indices for option discovery and validation.
If the dictionary already has data and force = FALSE, the object is
returned unchanged.
EsgDict$build( token = NULL, force = FALSE, cv_tag = NULL, request_tag = NULL, dreq_tag = NULL, use_cache = TRUE, source_dir = dict__source_dir(project = private$m_project) )
tokenOptional GitHub token used for source resolution and downloads.
forceIf TRUE, rebuild even when the dictionary already has
data and bypass the parsed dictionary cache.
cv_tagOptional vocabulary source tag or ref. When NULL, the
project default ref or latest tagged source is used.
request_tagOptional request-table source tag. Used by projects that define a request source, currently CMIP6.
dreq_tagDeprecated alias for request_tag.
use_cacheIf TRUE, use the parsed dictionary cache when
available. Raw source files may still be reused from source_dir.
source_dirDirectory used to read and write raw source files. The default is the package store source directory for this project.
The modified EsgDict object itself.
\dontrun{
dict <- EsgDict$new(project = "CMIP6")
dict$build()
dict$has_data()
}
get()
Return raw dictionary payload data.
$get("vocab") returns the full vocabulary payload list.
$get("request") and $get("dreq") return the request table when
available. Any other value is interpreted as a vocabulary field name,
such as "experiment_id" or "source_id".
EsgDict$get(type)
typeData type to retrieve. Use "vocab", "request",
"dreq", or a project vocabulary field name.
A copy of the requested data.
path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr")
dict <- EsgDict$new(project = "CMIP6")
suppressMessages(dict$load(path))
dict$get("experiment_id")
dict$get("request")
capabilities()
Return available dictionary capabilities.
Capabilities describe whether vocabulary data, request data, and relation indices are currently available.
EsgDict$capabilities()
A named list with vocab, request, and relations.
path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr")
dict <- EsgDict$new(project = "CMIP6")
suppressMessages(dict$load(path))
dict$capabilities()
relation_fields()
Return supported relation-index fields.
Relation fields describe which field combinations can be used for constrained option discovery and cross-field legality checks.
EsgDict$relation_fields()
A named list of character vectors.
dict <- EsgDict$new(project = "CMIP6") dict$relation_fields()
fields()
Return normalized dictionary field names.
Empty dictionaries return character().
EsgDict$fields()
A character vector.
path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr")
dict <- EsgDict$new(project = "CMIP6")
suppressMessages(dict$load(path))
dict$fields()
indices()
Return normalized dictionary indices.
$indices() returns all available indices. $indices(type) returns a
single index table, such as "values", "variable",
"activity_experiment", or "activity_source".
EsgDict$indices(type = NULL)
typeOptional index name.
A named list of indices, or a data.table::data.table() when
type is supplied.
path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr")
dict <- EsgDict$new(project = "CMIP6")
suppressMessages(dict$load(path))
names(dict$indices())
dict$indices("values")
options()
Discover valid values for a dictionary field.
Constraints supplied through ... are used when a matching relation
index exists. For example, CMIP6 experiment_id options can be
constrained by activity_id.
EsgDict$options(field, ...)
fieldESG dictionary field name or supported alias.
...Optional field constraints.
A data.table::data.table() with available values and
metadata. The ignored_constraints attribute records constraints
that could not be applied.
path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr")
dict <- EsgDict$new(project = "CMIP6")
suppressMessages(dict$load(path))
dict$options("experiment_id", activity_id = "CMIP")
check()
Check dictionary values and relationships.
$check() validates supplied values against dictionary value indices
and, when possible, validates cross-field combinations using relation
indices.
EsgDict$check(
...,
error = FALSE,
suggest = TRUE,
n_suggestions = 5L,
relationship = c("any", "all_pairs")
)...ESG dictionary field values.
errorIf TRUE, throw an error when invalid values or
relationships are found.
suggestIf TRUE, include near-match suggestions for invalid
values.
n_suggestionsMaximum number of suggestions for each invalid value.
relationshipRelationship validation mode. "any" validates
ESGF-query style OR semantics. "all_pairs" requires every supplied
combination inside each relation index to exist.
An esgdict_check_result data.table::data.table().
path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr")
dict <- EsgDict$new(project = "CMIP6")
suppressMessages(dict$load(path))
dict$check(activity_id = "CMIP", experiment_id = "historical")
dict$check(variable_id = "tas", table_id = "Amon")
save()
Save the dictionary to JSON.
If path = NULL, the dictionary is saved in the package store and
registered in the store manifest. If path is supplied, only that JSON
file is written.
EsgDict$save(path = NULL, allow_empty = FALSE)
pathOptional JSON file path. If NULL, use the package store.
allow_emptyIf TRUE, allow saving an empty dictionary.
The normalized output path.
dict_path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr")
dict <- EsgDict$new(project = "CMIP6")
suppressMessages(dict$load(dict_path))
path <- tempfile(fileext = ".json")
dict$save(path)
file.exists(path)
load()
Load a dictionary from JSON.
If path = NULL, the latest stored dictionary for this project is
located through the package store manifest.
EsgDict$load(path = NULL)
pathOptional JSON file path. If NULL, load the latest stored
dictionary for this project.
The modified EsgDict object itself.
path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr")
restored <- EsgDict$new(project = "CMIP6")
suppressMessages(restored$load(path))
restored$has_data()
print()
Print a dictionary summary.
EsgDict$print()
The EsgDict object itself, invisibly.
path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr")
dict <- EsgDict$new(project = "CMIP6")
suppressMessages(dict$load(path))
dict$print()
Hongyuan Jia
esgdict_option() and esgdict_check() for user-facing discovery
and validation helpers.
example_path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr") dict <- esgdict(project = "CMIP6") suppressMessages(dict$load(example_path)) dict$project() esgdict_set_default(dict) identical(esgdict_get_default("CMIP6"), dict) esgdict_option("experiment_id", activity_id = "CMIP") esgdict_check(activity = "CMIP", experiment = "historical") ## ------------------------------------------------ ## Method `EsgDict$new` ## ------------------------------------------------ dict <- EsgDict$new(project = "CMIP6") dict$status() ## ------------------------------------------------ ## Method `EsgDict$project` ## ------------------------------------------------ dict <- EsgDict$new(project = "CMIP6PLUS") dict$project() ## ------------------------------------------------ ## Method `EsgDict$profile` ## ------------------------------------------------ dict <- EsgDict$new(project = "CMIP6") dict$profile() ## ------------------------------------------------ ## Method `EsgDict$version` ## ------------------------------------------------ path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr") dict <- EsgDict$new(project = "CMIP6") suppressMessages(dict$load(path)) dict$version() ## ------------------------------------------------ ## Method `EsgDict$sources` ## ------------------------------------------------ path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr") dict <- EsgDict$new(project = "CMIP6") suppressMessages(dict$load(path)) dict$sources() ## ------------------------------------------------ ## Method `EsgDict$timestamp` ## ------------------------------------------------ path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr") dict <- EsgDict$new(project = "CMIP6") suppressMessages(dict$load(path)) dict$timestamp() ## ------------------------------------------------ ## Method `EsgDict$built_time` ## ------------------------------------------------ path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr") dict <- EsgDict$new(project = "CMIP6") suppressMessages(dict$load(path)) dict$built_time() ## ------------------------------------------------ ## Method `EsgDict$status` ## ------------------------------------------------ dict <- EsgDict$new(project = "CMIP6") dict$status() ## ------------------------------------------------ ## Method `EsgDict$has_data` ## ------------------------------------------------ dict <- EsgDict$new(project = "CMIP6") dict$has_data() ## ------------------------------------------------ ## Method `EsgDict$is_empty` ## ------------------------------------------------ dict <- EsgDict$new(project = "CMIP6") dict$is_empty() ## ------------------------------------------------ ## Method `EsgDict$build` ## ------------------------------------------------ ## Not run: dict <- EsgDict$new(project = "CMIP6") dict$build() dict$has_data() ## End(Not run) ## ------------------------------------------------ ## Method `EsgDict$get` ## ------------------------------------------------ path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr") dict <- EsgDict$new(project = "CMIP6") suppressMessages(dict$load(path)) dict$get("experiment_id") dict$get("request") ## ------------------------------------------------ ## Method `EsgDict$capabilities` ## ------------------------------------------------ path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr") dict <- EsgDict$new(project = "CMIP6") suppressMessages(dict$load(path)) dict$capabilities() ## ------------------------------------------------ ## Method `EsgDict$relation_fields` ## ------------------------------------------------ dict <- EsgDict$new(project = "CMIP6") dict$relation_fields() ## ------------------------------------------------ ## Method `EsgDict$fields` ## ------------------------------------------------ path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr") dict <- EsgDict$new(project = "CMIP6") suppressMessages(dict$load(path)) dict$fields() ## ------------------------------------------------ ## Method `EsgDict$indices` ## ------------------------------------------------ path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr") dict <- EsgDict$new(project = "CMIP6") suppressMessages(dict$load(path)) names(dict$indices()) dict$indices("values") ## ------------------------------------------------ ## Method `EsgDict$options` ## ------------------------------------------------ path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr") dict <- EsgDict$new(project = "CMIP6") suppressMessages(dict$load(path)) dict$options("experiment_id", activity_id = "CMIP") ## ------------------------------------------------ ## Method `EsgDict$check` ## ------------------------------------------------ path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr") dict <- EsgDict$new(project = "CMIP6") suppressMessages(dict$load(path)) dict$check(activity_id = "CMIP", experiment_id = "historical") dict$check(variable_id = "tas", table_id = "Amon") ## ------------------------------------------------ ## Method `EsgDict$save` ## ------------------------------------------------ dict_path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr") dict <- EsgDict$new(project = "CMIP6") suppressMessages(dict$load(dict_path)) path <- tempfile(fileext = ".json") dict$save(path) file.exists(path) ## ------------------------------------------------ ## Method `EsgDict$load` ## ------------------------------------------------ path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr") restored <- EsgDict$new(project = "CMIP6") suppressMessages(restored$load(path)) restored$has_data() ## ------------------------------------------------ ## Method `EsgDict$print` ## ------------------------------------------------ path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr") dict <- EsgDict$new(project = "CMIP6") suppressMessages(dict$load(path)) dict$print()example_path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr") dict <- esgdict(project = "CMIP6") suppressMessages(dict$load(example_path)) dict$project() esgdict_set_default(dict) identical(esgdict_get_default("CMIP6"), dict) esgdict_option("experiment_id", activity_id = "CMIP") esgdict_check(activity = "CMIP", experiment = "historical") ## ------------------------------------------------ ## Method `EsgDict$new` ## ------------------------------------------------ dict <- EsgDict$new(project = "CMIP6") dict$status() ## ------------------------------------------------ ## Method `EsgDict$project` ## ------------------------------------------------ dict <- EsgDict$new(project = "CMIP6PLUS") dict$project() ## ------------------------------------------------ ## Method `EsgDict$profile` ## ------------------------------------------------ dict <- EsgDict$new(project = "CMIP6") dict$profile() ## ------------------------------------------------ ## Method `EsgDict$version` ## ------------------------------------------------ path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr") dict <- EsgDict$new(project = "CMIP6") suppressMessages(dict$load(path)) dict$version() ## ------------------------------------------------ ## Method `EsgDict$sources` ## ------------------------------------------------ path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr") dict <- EsgDict$new(project = "CMIP6") suppressMessages(dict$load(path)) dict$sources() ## ------------------------------------------------ ## Method `EsgDict$timestamp` ## ------------------------------------------------ path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr") dict <- EsgDict$new(project = "CMIP6") suppressMessages(dict$load(path)) dict$timestamp() ## ------------------------------------------------ ## Method `EsgDict$built_time` ## ------------------------------------------------ path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr") dict <- EsgDict$new(project = "CMIP6") suppressMessages(dict$load(path)) dict$built_time() ## ------------------------------------------------ ## Method `EsgDict$status` ## ------------------------------------------------ dict <- EsgDict$new(project = "CMIP6") dict$status() ## ------------------------------------------------ ## Method `EsgDict$has_data` ## ------------------------------------------------ dict <- EsgDict$new(project = "CMIP6") dict$has_data() ## ------------------------------------------------ ## Method `EsgDict$is_empty` ## ------------------------------------------------ dict <- EsgDict$new(project = "CMIP6") dict$is_empty() ## ------------------------------------------------ ## Method `EsgDict$build` ## ------------------------------------------------ ## Not run: dict <- EsgDict$new(project = "CMIP6") dict$build() dict$has_data() ## End(Not run) ## ------------------------------------------------ ## Method `EsgDict$get` ## ------------------------------------------------ path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr") dict <- EsgDict$new(project = "CMIP6") suppressMessages(dict$load(path)) dict$get("experiment_id") dict$get("request") ## ------------------------------------------------ ## Method `EsgDict$capabilities` ## ------------------------------------------------ path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr") dict <- EsgDict$new(project = "CMIP6") suppressMessages(dict$load(path)) dict$capabilities() ## ------------------------------------------------ ## Method `EsgDict$relation_fields` ## ------------------------------------------------ dict <- EsgDict$new(project = "CMIP6") dict$relation_fields() ## ------------------------------------------------ ## Method `EsgDict$fields` ## ------------------------------------------------ path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr") dict <- EsgDict$new(project = "CMIP6") suppressMessages(dict$load(path)) dict$fields() ## ------------------------------------------------ ## Method `EsgDict$indices` ## ------------------------------------------------ path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr") dict <- EsgDict$new(project = "CMIP6") suppressMessages(dict$load(path)) names(dict$indices()) dict$indices("values") ## ------------------------------------------------ ## Method `EsgDict$options` ## ------------------------------------------------ path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr") dict <- EsgDict$new(project = "CMIP6") suppressMessages(dict$load(path)) dict$options("experiment_id", activity_id = "CMIP") ## ------------------------------------------------ ## Method `EsgDict$check` ## ------------------------------------------------ path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr") dict <- EsgDict$new(project = "CMIP6") suppressMessages(dict$load(path)) dict$check(activity_id = "CMIP", experiment_id = "historical") dict$check(variable_id = "tas", table_id = "Amon") ## ------------------------------------------------ ## Method `EsgDict$save` ## ------------------------------------------------ dict_path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr") dict <- EsgDict$new(project = "CMIP6") suppressMessages(dict$load(dict_path)) path <- tempfile(fileext = ".json") dict$save(path) file.exists(path) ## ------------------------------------------------ ## Method `EsgDict$load` ## ------------------------------------------------ path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr") restored <- EsgDict$new(project = "CMIP6") suppressMessages(restored$load(path)) restored$has_data() ## ------------------------------------------------ ## Method `EsgDict$print` ## ------------------------------------------------ path <- system.file("extdata", "examples", "cmip6-dict.json", package = "epwshiftr") dict <- EsgDict$new(project = "CMIP6") suppressMessages(dict$load(path)) dict$print()
esgdict_check() validates project parameter values against the local ESG
dictionary. It checks individual field values and cross-field relationships
represented in the dictionary's normalized query indices.
esgdict_check( ..., project = NULL, dict = NULL, error = FALSE, suggest = TRUE, n_suggestions = 5L, relationship = c("any", "all_pairs") )esgdict_check( ..., project = NULL, dict = NULL, error = FALSE, suggest = TRUE, n_suggestions = 5L, relationship = c("any", "all_pairs") )
... |
ESG dictionary field values. |
project |
ESG project identifier. If |
dict |
Optional EsgDict object. If |
error |
If |
suggest |
If |
n_suggestions |
Maximum number of suggestions to keep for each invalid value. |
relationship |
Relationship validation mode. |
An esgdict_check_result data.table::data.table().
esgdict_option() returns valid values for an ESG dictionary field,
optionally constrained by other supplied fields.
esgdict_option(field, ..., project = NULL, dict = NULL, warn_ignored = TRUE)esgdict_option(field, ..., project = NULL, dict = NULL, warn_ignored = TRUE)
field |
An ESG dictionary field name or supported alias. |
... |
Optional field constraints, such as |
project |
ESG project identifier. If |
dict |
Optional EsgDict object. If |
warn_ignored |
If |
A data.table::data.table() with at least field, value, and
description columns. The ignored_constraints attribute records
constraints that were not used.
The Earth System Grid Federation (ESGF) is an international collaboration for the software that powers most global climate change research, notably assessments by the Intergovernmental Panel on Climate Change (IPCC).
The ESGF search service exposes RESTful APIs that can be used by clients to query the contents of the underlying search index, and return results matching the given constraints. The documentation of the APIs can be found using this link.
EsgQuery is the workhorse for dealing with ESGF search services.
Start with esg_query() / EsgQuery for new workflow code. The legacy
data.table-oriented API is available from the legacy branch or v0.1.4.
esg_query(index_node = "https://esgf-node.ornl.gov")esg_query(index_node = "https://esgf-node.ornl.gov")
index_node |
The URL to the ESGF Index Node. Default is to use the ORNL (Oak Ridge National Laboratory) Index Node. Current possible values could be:
|
EsgQuery objectesg_query() returns an EsgQuery object, which is an R6
object with quite a few methods that can be classified into 3 categories:
Value listing: methods to list all possible values of facets, fields, shards, and values.
Parameter getter & setter: methods to get the query parameter values or set them before sending the actual query to the ESGF search services.
Query responses: methods to collect results for the query response.
EsgQuery object provides the following value-listing methods to query
available facets, fields, shards, and values from the ESGF index node:
EsgQuery$list_facets():
List all available facet names. When called, a
facet listing query
is sent to the index node to get all available facets for the current
project (default: CMIP6).
EsgQuery$list_fields():
List all available field names. This is useful for bridge index nodes
where facet listing is not available.
EsgQuery$list_shards():
List all available shards (ESGF index nodes) that can be queried in
distributed searches.
EsgQuery$list_values():
List all available values of specific facets.
The ESGF search services support a lot of parameters. The EsgQuery
contains dedicated methods to set values for most of them, including:
Most common keywords:
facets,
offset,
limit,
fields,
replica,
latest,
distrib
and
shards.
Most common facets:
project,
activity_id,
experiment_id,
source_id,
variable_id,
frequency,
variant_label,
nominal_resolution,
datetime_range,
timestamp_range,
version_range
and
data_node.
All methods act in a similar way:
If input is given, the corresponding parameter is set and the updated
EsgQuery object is returned.
This makes it possible to chain different parameter setters, e.g.
EsgQuery$project("CMIP6")$frequency("day")$limit(1) sets the parameter
project, frequency and limit sequentially.
For parameters that want character inputs, you can put a preceding ! to
negate the constraints, e.g. EsgQuery$project(!"CMIP6") searches for
all projects except for CMIP6.
If no input is given, the current parameter value is returned. For example,
directly calling EsgQuery$project() returns the current value of the
project parameter. The returned value can be two types:
NULL, i.e. there is no constraint on the corresponding parameter
A QueryParam object. Use query_param__value() and
query_param__negate() to inspect it.
Despite methods for specific keywords and facets, you can specify arbitrary
query parameters using
EsgQuery$params() method. For
details on the usage, please see the
documentation.
The query is not sent unless related methods are called:
EsgQuery$count(): Count the total
number of records that match the query.
You can return only the total number of matched record by calling
EsgQuery$count(facets = FALSE)
You can also count the matched records for specified facets, e.g.
EsgQuery$count(facets = c("source_id", "activity_id"))
EsgQuery$collect(): Collect the
query results and format it into an EsgResultDataset object.
Some ESGF index nodes are "bridge" nodes that have certain limitations
compared to standard index nodes. When using a bridge index node (e.g.,
https://esgf-node.ornl.gov/esgf-1-5-bridge), the following restrictions
apply:
The fields parameter is not supported. All available fields are always
returned.
Only Dataset and File queries are supported. Aggregation queries
should use a standard ESGF search index node, such as
https://esgf-data.dkrz.de or https://esgf.ceda.ac.uk.
The retracted parameter is not supported and will be ignored.
Wget script generation is not supported. Calling $url(wget = TRUE) will
result in an error.
Facet listing is not available. $list_facets() will return a predefined
set of common facets instead. Use $list_fields() to get all available
fields.
EsgQuery object also provides several other helper functions:
Query URL generation:
EsgQuery$url(): Returns the actual
query URL or the wget script URL which can be used to download all files
matching the given constraints.
State persistence:
EsgQuery$save(): Save the query
state to a JSON file for later use.
EsgQuery$load(): Restore the
query state from a JSON file created by $save().
Display:
EsgQuery$print(): Print a
summary of the current EsgQuery object including the index node URL
and all query parameters.
new()
Create a new EsgQuery object
EsgQuery$new(index_node = "https://esgf-node.ornl.gov")
index_nodeThe URL to the ESGF Index Node. Default is to use the ORNL (Oak Ridge National Laboratory) Index Node. Current possible values could be:
ORNL (Oak Ridge National Laboratory), USA:
https://esgf-node.ornl.gov. The default value.
LLNL (Lawrence Livermore National Laboratory), USA:
https://esgf-node.llnl.gov
NCI (National Computational Infrastructure), Australia:
https://esgf.nci.org.au
IPSL (Institut Pierre-Simon Laplace), France:
https://esgf-node.ipsl.upmc.fr
DKRZ (Deutsches Klimarechenzentrum), Germany:
https://esgf-data.dkrz.de
LIU (National Academic Infrastructure for Supercomputing), Sweden:
https://esg-dn1.nsc.liu.se
CEDA (Centre for Environmental Data Analysis), UK:
https://esgf.ceda.ac.uk
An EsgQuery object.
\dontrun{
q <- EsgQuery$new(index_node = "https://esgf-node.ornl.gov")
q
}
index_node()
Get or set the ESGF index node.
$index_node() returns the current normalized index node URL.
$index_node(value) updates the index node after applying the same
normalization used by EsgQuery$new().
Existing query parameters are kept unchanged.
EsgQuery$index_node(value)
valueA string giving the new index node URL. If omitted, the current index node is returned.
If value is supplied, the modified EsgQuery object.
Otherwise, a string.
\dontrun{
q$index_node()
q$index_node("https://esgf.ceda.ac.uk")
}
list_facets()
List all available facet names
EsgQuery$list_facets(force = FALSE)
forceBy default, every facet listing query is cached and
reused when possible. If TRUE, the previous cache is
abandoned and a new query is re-sent and cached. Default:
FALSE.
A character vector.
\dontrun{
q$list_facets()
}
list_fields()
List all available field names
EsgQuery$list_fields(force = FALSE)
forceBy default, every field listing query is cached and
reused when possible. If TRUE, the previous cache is
abandoned and a new query is re-sent and cached. Default:
FALSE.
A character vector or NULL if no facet listing is found.
\dontrun{
q$list_fields()
}
list_shards()
List all available shards.
EsgQuery$list_shards(force = FALSE)
forceBy default, every shard listing query is cached and
reused when possible. If TRUE, the previous cache is
abandoned and a new query is re-sent and cached. Default:
FALSE.
A character vector or NULL if no shard listing is found.
\dontrun{
q$list_shards()
}
list_values()
List all available values of specific facets.
EsgQuery$list_values(facets, force = FALSE)
facetsA character vector giving the facet names.
forceBy default, every value listing query is cached and
reused when possible. If TRUE, the previous cache is
abandoned and a new query is re-sent and cached. Default:
FALSE.
If length(facets) == 1, a named integer vector giving
the facet value counts. Otherwise, a list of named integer
vectors of the same length as facets.
\dontrun{
q$list_values(c("activity_id", "experiment_id"))
}
project()
Get or set the project facet parameter.
EsgQuery$project(value = "CMIP6")
valueA character vector, NULL, or a negated character
expression such as !"CMIP6". If omitted, the current value is
returned. Default when setting without an explicit value:
"CMIP6".
If value is supplied, the modified EsgQuery object.
Otherwise, a QueryParam object or NULL.
activity_id()
Get or set the activity_id facet parameter.
EsgQuery$activity_id(value)
valueA character vector, NULL, or a negated character
expression such as !c("CFMIP", "ScenarioMIP"). If omitted,
the current value is returned.
If value is supplied, the modified EsgQuery object.
Otherwise, a QueryParam object or NULL.
experiment_id()
Get or set the experiment_id facet parameter.
EsgQuery$experiment_id(value)
valueA character vector, NULL, or a negated character
expression such as !c("ssp126", "ssp585"). If omitted, the
current value is returned.
If value is supplied, the modified EsgQuery object.
Otherwise, a QueryParam object or NULL.
source_id()
Get or set the source_id facet parameter.
EsgQuery$source_id(value)
valueA character vector, NULL, or a negated character
expression. If omitted, the current value is returned.
If value is supplied, the modified EsgQuery object.
Otherwise, a QueryParam object or NULL.
variable_id()
Get or set the variable_id facet parameter.
EsgQuery$variable_id(value)
valueA character vector, NULL, or a negated character
expression. If omitted, the current value is returned.
If value is supplied, the modified EsgQuery object.
Otherwise, a QueryParam object or NULL.
frequency()
Get or set the frequency facet parameter.
EsgQuery$frequency(value)
valueA character vector, NULL, or a negated character
expression. If omitted, the current value is returned.
If value is supplied, the modified EsgQuery object.
Otherwise, a QueryParam object or NULL.
variant_label()
Get or set the variant_label facet parameter.
EsgQuery$variant_label(value)
valueA character vector, NULL, or a negated character
expression. If omitted, the current value is returned.
If value is supplied, the modified EsgQuery object.
Otherwise, a QueryParam object or NULL.
nominal_resolution()
Get or set the nominal_resolution facet parameter.
EsgQuery$nominal_resolution(value)
valueA character vector, NULL, or a negated character
expression. If omitted, the current value is returned.
If value is supplied, the modified EsgQuery object.
Otherwise, a QueryParam object or NULL.
data_node()
Get or set the data_node facet parameter.
EsgQuery$data_node(value)
valueA character vector, NULL, or a negated character
expression. If omitted, the current value is returned.
If value is supplied, the modified EsgQuery object.
Otherwise, a QueryParam object or NULL.
facets()
Get or set the facets parameter used by $count().
EsgQuery$facets(value)
valueA character vector, "*", or NULL. If omitted, the
current value is returned.
If value is supplied, the modified EsgQuery object.
Otherwise, a QueryParam object or NULL.
fields()
Get or set the fields parameter.
EsgQuery$fields(value = "*")
valueA character vector, "*", or NULL. If omitted, the
current value is returned. Default when setting without an
explicit value: "*".
If value is supplied, the modified EsgQuery object.
Otherwise, a QueryParam object or NULL.
shards()
Get or set the shards parameter for distributed searches.
EsgQuery$shards(value)
valueA character vector or NULL. If omitted, the current
value is returned.
If value is supplied, the modified EsgQuery object.
Otherwise, a QueryParam object or NULL.
datetime_range()
Get or set temporal coverage overlap constraints.
EsgQuery$datetime_range(start, stop)
start, stopTemporal boundary strings accepted by solr_date(),
complete Solr range expressions, "*", or NULL. If both are
omitted, the current range state is returned. The helper renders
Solr constraints for the ESGF REST start/end temporal
coverage keyword semantics.
If either boundary is supplied, the modified EsgQuery
object. Otherwise, a list with start and stop elements.
timestamp_range()
Get or set Solr index timestamp range constraints.
EsgQuery$timestamp_range(from, to)
from, toTimestamp boundary strings accepted by solr_date(),
"*", or NULL. Complete Solr range expressions are not
accepted here. If both are omitted, the current range state is
returned.
If either boundary is supplied, the modified EsgQuery
object. Otherwise, a list with from and to elements.
version_range()
Get or set version range constraints.
EsgQuery$version_range(min, max)
min, maxVersion boundaries such as 20200101, "20200101",
simplified dates, "*", or NULL. ESGF version is queried
as a numeric field; simplified date inputs are normalized to
comparable YYYYMMDD integer boundaries before rendering.
Solr Date Math and complete range expressions are not accepted
here. If both are omitted, the current range state is returned.
If either boundary is supplied, the modified EsgQuery
object. Otherwise, a list with min and max elements.
replica()
Get or set the replica parameter.
EsgQuery$replica(value)
valueA flag or NULL. If omitted, the current value is
returned.
If value is supplied, the modified EsgQuery object.
Otherwise, a QueryParam object or NULL.
latest()
Get or set the latest parameter.
EsgQuery$latest(value = NULL)
valueA flag, or NULL to remove the latest constraint. If
omitted, the current value is returned.
If value is supplied, the modified EsgQuery object.
Otherwise, a QueryParam object or NULL.
limit()
Get or set the limit parameter.
EsgQuery$limit(value = 10L)
valueA positive integer. If omitted, the current value is
returned. Default when setting without an explicit value: 10L.
If value is supplied, the modified EsgQuery object.
Otherwise, a QueryParam object or NULL.
offset()
Get or set the offset parameter.
EsgQuery$offset(value = 0L)
valueA non-negative integer. If omitted, the current value is
returned. Default when setting without an explicit value: 0L.
If value is supplied, the modified EsgQuery object.
Otherwise, a QueryParam object or NULL.
distrib()
Get or set the distrib parameter.
EsgQuery$distrib(value = TRUE)
valueA flag. If omitted, the current value is returned.
Default when setting without an explicit value: TRUE.
If value is supplied, the modified EsgQuery object.
Otherwise, a QueryParam object or NULL.
params()
Get or set ad hoc query parameters.
$params() handles parameters without dedicated methods and can also
update supported dedicated parameters by name.
The type and format control parameters cannot be changed here:
EsgQuery always performs Dataset queries and always parses JSON
responses. Use EsgResultDataset$collect() to collect File or
Aggregation records from Dataset results.
EsgQuery$params(...)
...Named parameter values. If omitted, existing ad hoc
parameters are returned. If a single unnamed NULL is supplied,
all ad hoc parameters are removed.
If parameters are supplied, the modified EsgQuery object.
Otherwise, a named list of QueryParam objects.
url()
Get the URL of actual query or wget script
The wget script URL can be used to download a bash script that contains wget commands for downloading all files matching the query constraints. This is useful for batch downloading large amounts of data.
EsgQuery$url(wget = FALSE)
wgetWhether to return the URL of the wget script that can be
used to download all files matching the given constraints.
Default: FALSE.
A single string.
\dontrun{
q$url()
# get the wget script URL
q$url(wget = TRUE)
# You can download the wget script using the URL directly. For
# example, the code below downloads the script and save it as
# 'wget.sh' in R's temporary folder:
download.file(q$url(TRUE), file.path(tempdir(), "wget.sh"), mode = "wb")
}
count()
Send a query of facet counting and fetch the results
EsgQuery$count(facets = TRUE)
facetsNULL, a flag or a character vector. There are three
options:
If NULL or FALSE, only the total number of matched records is
returned.
If TRUE, the value of $facets()
is used to limit the facets. If $facets() returns NULL, only the
total count is returned. This is the default value.
If a character vector, it is used to limit the facets.
If facets equals NULL or FALSE, or $facets() returns NULL,
an integer.
Otherwise, a named list with the first element always being total
which is the total number of matched records. Other elements have
the same length as input facets and are all named integer vectors.
\dontrun{
# get the total number of matched records
q$count(NULL) # or q$count(facets = FALSE)
# count records for specific facets
q$facets(c("activity_id", "source_id"))$count()
# same as above
q$count(facets = c("activity_id", "source_id"))
}
collect()
Send the actual query and fetch the results
$collect() sends the actual query to the ESGF search services.
By default it collects type=Dataset results and returns an
EsgResultDataset object. If type is "File" or
"Aggregation", it first collects matching Dataset results and then
collects child File or Aggregation results for those datasets.
The fields included depend on fields parameter.
However, the following fields are always included in the results:
access, data_node, id, index_node, instance_id, latest, master_id, number_of_aggregations, number_of_files, replica, size, url, version.
When a local EsgDict is available for the query project, $collect()
also performs a warning-only dictionary check before sending the query.
Missing local dictionaries are ignored and never downloaded.
EsgQuery$collect(
all = FALSE,
limit = TRUE,
params = TRUE,
type = "Dataset",
fields = NULL,
progress = getOption("epwshiftr.progress", interactive()),
...
)allWhether to collect all results despite of the value of
offset. Default: FALSE.
limitIf all = FALSE, the maximum number of records to
collect in this request. If all = TRUE, the page size used
for each paginated request, not a total cap. When all = TRUE
and limit = TRUE, the current query limit value is used;
if limit = FALSE, the allowed maximum limit number
10000 is used. It can also be a positive
integer used as a temporary page size. Default: TRUE.
paramsWhether to include facet fields that have parameter
constraints explicitly set using EsgQuery$project(),
EsgQuery$activity_id(), EsgQuery$params() and etc. in the
returned fields. For example, if you set $experiment_id("ssp585"),
the experiment_id field will be included in the results when
params = TRUE. Default: TRUE.
typeResult type to collect. One of "Dataset", "File",
or "Aggregation". Default: "Dataset".
fieldsOptional fields used only when type is "File" or
"Aggregation". Dataset fields should be configured with
$fields() before collecting.
progressWhether to show a progress bar while collecting ESGF
JSON search pages. By default, the value of option
epwshiftr.progress is used, falling back to interactive().
...Arguments passed to EsgResultDataset child collection
when type is "File" or "Aggregation", including the
data_node scope filter and child-query controls.
File/Aggregation collection does not use ESGF datetime search
parameters; use $filter_time() on the returned result for
time filtering.
An EsgResultDataset, EsgResultFile, or EsgResultAggregation object.
\dontrun{
# by default, all fields with constrains are included in the results
query <- esg_query()$experiment_id("ssp585")$frequency("1hr")$fields("source_id")
res1 <- query$collect()
res1$fields
# set `params` to `FALSE` to exclude them
query$collect(params = FALSE)$fields
# collect all matched records with `query$limit()` records per query
res2 <- query$collect(all = TRUE, limit = TRUE)
identical(query$count(), res2$count())
# same as above, but collect all matched records with max allowed
# record limit per query
res3 <- query$collect(all = TRUE, limit = FALSE)
identical(res2$count(), res3$count())
# same as above, but collect all matched records with specified limit
# per query
res4 <- query$collect(all = TRUE, limit = 30)
identical(res2$count(), res4$count())
}
state()
Get the current query state.
$state() returns a read-only snapshot containing the current index
node and the current parameter state.
EsgQuery$state(name = NULL, null = FALSE)
nameA character vector of parameter names to include, or
NULL to include all parameters.
nullIf TRUE, include parameters whose current value is
NULL. Otherwise, omit unset parameters.
A named list with elements index_node and parameter.
\dontrun{
q$state()
q$state(null = TRUE)
}
reset()
Reset query parameters to their defaults.
$reset() clears the current parameter store and restores the default
query parameters. The current index node is kept unchanged.
EsgQuery$reset()
The modified EsgQuery object itself.
\dontrun{
q$experiment_id("ssp585")$reset()
}
save()
Save the query into a JSON file
$save() puts main data of an EsgQuery object into a JSON file
which can be loaded to restore the current state of query using
EsgQuery$load().
EsgQuery$save(file = "query.json", pretty = TRUE)
fileA string indicating the JSON file path to save the data to.
prettyWhether to add indentation whitespace to JSON output.
For details, please see jsonlite::toJSON(). Default: TRUE.
The full path of the output JSON file.
\dontrun{
q$save(tempfile(fileext = ".json"))
}
load()
Restore the query state from an JSON file
$load() reads data of an EsgQuery object from a JSON file
created using
EsgQuery$save().
EsgQuery$load(file)
fileA string indicating the JSON file path to read the data from.
The modified EsgQuery object itself.
\dontrun{
f <- tempfile(fileext = "json")
q <- esg_query()
json <- q$save(f)
q$load(f)
}
print()
Print a summary of the current EsgQuery object
$print() gives the summary of current EsgQuery object including
the index node URL and all query parameters.
EsgQuery$print()
The EsgQuery object itself, invisibly.
\dontrun{
q$print()
}
clone()
The objects of this class are cloneable with this method.
EsgQuery$clone(deep = FALSE)
deepWhether to make a deep clone.
For bridge index nodes, only predefined common facets are returned.
$list_fields() can be used to get all available fields,
including facets.
Hongyuan Jia
## ------------------------------------------------ ## Method `EsgQuery$new` ## ------------------------------------------------ ## Not run: q <- EsgQuery$new(index_node = "https://esgf-node.ornl.gov") q ## End(Not run) ## ------------------------------------------------ ## Method `EsgQuery$index_node` ## ------------------------------------------------ ## Not run: q$index_node() q$index_node("https://esgf.ceda.ac.uk") ## End(Not run) ## ------------------------------------------------ ## Method `EsgQuery$list_facets` ## ------------------------------------------------ ## Not run: q$list_facets() ## End(Not run) ## ------------------------------------------------ ## Method `EsgQuery$list_fields` ## ------------------------------------------------ ## Not run: q$list_fields() ## End(Not run) ## ------------------------------------------------ ## Method `EsgQuery$list_shards` ## ------------------------------------------------ ## Not run: q$list_shards() ## End(Not run) ## ------------------------------------------------ ## Method `EsgQuery$list_values` ## ------------------------------------------------ ## Not run: q$list_values(c("activity_id", "experiment_id")) ## End(Not run) ## ------------------------------------------------ ## Method `EsgQuery$url` ## ------------------------------------------------ ## Not run: q$url() # get the wget script URL q$url(wget = TRUE) # You can download the wget script using the URL directly. For # example, the code below downloads the script and save it as # 'wget.sh' in R's temporary folder: download.file(q$url(TRUE), file.path(tempdir(), "wget.sh"), mode = "wb") ## End(Not run) ## ------------------------------------------------ ## Method `EsgQuery$count` ## ------------------------------------------------ ## Not run: # get the total number of matched records q$count(NULL) # or q$count(facets = FALSE) # count records for specific facets q$facets(c("activity_id", "source_id"))$count() # same as above q$count(facets = c("activity_id", "source_id")) ## End(Not run) ## ------------------------------------------------ ## Method `EsgQuery$collect` ## ------------------------------------------------ ## Not run: # by default, all fields with constrains are included in the results query <- esg_query()$experiment_id("ssp585")$frequency("1hr")$fields("source_id") res1 <- query$collect() res1$fields # set `params` to `FALSE` to exclude them query$collect(params = FALSE)$fields # collect all matched records with `query$limit()` records per query res2 <- query$collect(all = TRUE, limit = TRUE) identical(query$count(), res2$count()) # same as above, but collect all matched records with max allowed # record limit per query res3 <- query$collect(all = TRUE, limit = FALSE) identical(res2$count(), res3$count()) # same as above, but collect all matched records with specified limit # per query res4 <- query$collect(all = TRUE, limit = 30) identical(res2$count(), res4$count()) ## End(Not run) ## ------------------------------------------------ ## Method `EsgQuery$state` ## ------------------------------------------------ ## Not run: q$state() q$state(null = TRUE) ## End(Not run) ## ------------------------------------------------ ## Method `EsgQuery$reset` ## ------------------------------------------------ ## Not run: q$experiment_id("ssp585")$reset() ## End(Not run) ## ------------------------------------------------ ## Method `EsgQuery$save` ## ------------------------------------------------ ## Not run: q$save(tempfile(fileext = ".json")) ## End(Not run) ## ------------------------------------------------ ## Method `EsgQuery$load` ## ------------------------------------------------ ## Not run: f <- tempfile(fileext = "json") q <- esg_query() json <- q$save(f) q$load(f) ## End(Not run) ## ------------------------------------------------ ## Method `EsgQuery$print` ## ------------------------------------------------ ## Not run: q$print() ## End(Not run)## ------------------------------------------------ ## Method `EsgQuery$new` ## ------------------------------------------------ ## Not run: q <- EsgQuery$new(index_node = "https://esgf-node.ornl.gov") q ## End(Not run) ## ------------------------------------------------ ## Method `EsgQuery$index_node` ## ------------------------------------------------ ## Not run: q$index_node() q$index_node("https://esgf.ceda.ac.uk") ## End(Not run) ## ------------------------------------------------ ## Method `EsgQuery$list_facets` ## ------------------------------------------------ ## Not run: q$list_facets() ## End(Not run) ## ------------------------------------------------ ## Method `EsgQuery$list_fields` ## ------------------------------------------------ ## Not run: q$list_fields() ## End(Not run) ## ------------------------------------------------ ## Method `EsgQuery$list_shards` ## ------------------------------------------------ ## Not run: q$list_shards() ## End(Not run) ## ------------------------------------------------ ## Method `EsgQuery$list_values` ## ------------------------------------------------ ## Not run: q$list_values(c("activity_id", "experiment_id")) ## End(Not run) ## ------------------------------------------------ ## Method `EsgQuery$url` ## ------------------------------------------------ ## Not run: q$url() # get the wget script URL q$url(wget = TRUE) # You can download the wget script using the URL directly. For # example, the code below downloads the script and save it as # 'wget.sh' in R's temporary folder: download.file(q$url(TRUE), file.path(tempdir(), "wget.sh"), mode = "wb") ## End(Not run) ## ------------------------------------------------ ## Method `EsgQuery$count` ## ------------------------------------------------ ## Not run: # get the total number of matched records q$count(NULL) # or q$count(facets = FALSE) # count records for specific facets q$facets(c("activity_id", "source_id"))$count() # same as above q$count(facets = c("activity_id", "source_id")) ## End(Not run) ## ------------------------------------------------ ## Method `EsgQuery$collect` ## ------------------------------------------------ ## Not run: # by default, all fields with constrains are included in the results query <- esg_query()$experiment_id("ssp585")$frequency("1hr")$fields("source_id") res1 <- query$collect() res1$fields # set `params` to `FALSE` to exclude them query$collect(params = FALSE)$fields # collect all matched records with `query$limit()` records per query res2 <- query$collect(all = TRUE, limit = TRUE) identical(query$count(), res2$count()) # same as above, but collect all matched records with max allowed # record limit per query res3 <- query$collect(all = TRUE, limit = FALSE) identical(res2$count(), res3$count()) # same as above, but collect all matched records with specified limit # per query res4 <- query$collect(all = TRUE, limit = 30) identical(res2$count(), res4$count()) ## End(Not run) ## ------------------------------------------------ ## Method `EsgQuery$state` ## ------------------------------------------------ ## Not run: q$state() q$state(null = TRUE) ## End(Not run) ## ------------------------------------------------ ## Method `EsgQuery$reset` ## ------------------------------------------------ ## Not run: q$experiment_id("ssp585")$reset() ## End(Not run) ## ------------------------------------------------ ## Method `EsgQuery$save` ## ------------------------------------------------ ## Not run: q$save(tempfile(fileext = ".json")) ## End(Not run) ## ------------------------------------------------ ## Method `EsgQuery$load` ## ------------------------------------------------ ## Not run: f <- tempfile(fileext = "json") q <- esg_query() json <- q$save(f) q$load(f) ## End(Not run) ## ------------------------------------------------ ## Method `EsgQuery$print` ## ------------------------------------------------ ## Not run: q$print() ## End(Not run)
EsgStore manages a local DuckDB manifest and a fixed directory layout for
query result snapshots, dictionaries, source files, downloaded NetCDF files,
Parquet regional extracts, and generated outputs.
pathStore directory.
manifestDuckDB manifest path.
is_openWhether the manifest connection is open.
new()
Create or open a local store.
EsgStore$new(path = NULL, create = TRUE, overwrite = FALSE)
pathStore directory. Default: store_dir().
createIf TRUE, create the store directory when it does not
exist. Default: TRUE.
overwriteIf TRUE, remove an existing store directory before
creating a new store. Default: FALSE.
An EsgStore object.
close()
Close the DuckDB connection.
EsgStore$close()
The store object itself, invisibly.
get_meta()
Return a store metadata value.
EsgStore$get_meta(key, default = NULL)
keyMetadata key.
defaultValue returned when key is not set.
A single string, or default.
set_meta()
Set a store metadata value.
EsgStore$set_meta(key, value)
keyMetadata key.
valueMetadata value. NULL is stored as NA.
The store object, invisibly.
download_layout()
Return the store download layout policy.
EsgStore$download_layout()
A named list describing how store downloads are placed under
downloads/.
set_download_layout()
Configure how store-managed ESGF downloads are placed under
downloads/.
EsgStore$set_download_layout(
layout = c("flat", "dataset", "drs", "template"),
template = NULL,
include_version = TRUE,
collision = c("error", "checksum", "suffix"),
missing = c("fallback", "error")
)layoutDownload layout. "flat" stores files directly under
downloads/; "dataset" groups by dataset; "drs" uses a
CMIP6-style DRS path; "template" uses template.
templateOptional subdirectory template for layout = "template", using placeholders such as {source_id}.
include_versionWhether DRS paths include the ESGF version.
Default: TRUE.
collisionHow to handle different logical files that map to
the same local path. Default: "error".
missingHow to handle missing layout fields. Default:
"fallback".
The store object, invisibly.
register_artifact()
Register a file artifact in the store manifest.
EsgStore$register_artifact( kind, path, role = NULL, project = NULL, status = "available", checksum = NULL, checksum_type = "sha256", size = NULL, query_id = NULL, file_key = NULL, dict_id = NULL, source_url = NULL, source_repo = NULL, source_tag = NULL, source_commit = NULL, metadata = list() )
kindArtifact kind.
pathArtifact path. Absolute paths must be inside the store root.
roleArtifact role. If NULL, a role is inferred from kind.
projectOptional ESGF project.
statusArtifact status. Default: "available".
checksumExpected checksum. If NULL and path exists, it is
calculated with checksum_type.
checksum_typeChecksum algorithm. Default: "sha256".
sizeArtifact size in bytes. If NULL and path exists, it is
read from the file.
query_id, file_key, dict_idOptional manifest links.
source_url, source_repo, source_tag, source_commitOptional source provenance.
metadataOptional metadata list encoded as JSON.
The artifact ID.
artifact_path()
Return an artifact path from the manifest.
EsgStore$artifact_path(artifact_id)
artifact_idArtifact ID.
Absolute artifact path.
validate()
Validate registered artifact files against the manifest.
EsgStore$validate()
A data.table with validation results.
add_query()
Add an ESGF query to the long-lived store query registry.
EsgStore$add_query(query, label = NULL, track = FALSE)
queryAn EsgQuery object.
labelOptional label.
trackWhether to mark the query as tracked. Default: FALSE.
The stable query ID.
track_query()
Mark a stored ESGF query as tracked.
EsgStore$track_query(query_id)
query_idQuery ID returned by $add_query().
The store object, invisibly.
untrack_query()
Mark a stored ESGF query as untracked.
EsgStore$untrack_query(query_id)
query_idQuery ID returned by $add_query().
The store object, invisibly.
tag_query()
Add tags to a stored ESGF query.
EsgStore$tag_query(query_id, tag, replace = FALSE)
query_idQuery ID returned by $add_query().
tagCharacter vector of tags.
replaceWhether to replace existing tags for the query.
A data.table of tags for the query.
untag_query()
Remove tags from a stored ESGF query.
EsgStore$untag_query(query_id, tag = NULL)
query_idQuery ID returned by $add_query().
tagOptional tags. If NULL, all tags are removed.
A data.table of remaining tags for the query.
query_tags()
List stored ESGF query tags.
EsgStore$query_tags(query_id = NULL)
query_idOptional query ID filter.
A data.table of query tags.
require_query()
Record that one stored query depends on another stored query.
EsgStore$require_query(query_id, parent_query_id)
query_idChild query ID.
parent_query_idRequired parent query ID.
A data.table of query dependency edges.
unrequire_query()
Remove query dependency edges.
EsgStore$unrequire_query(query_id, parent_query_id = NULL)
query_idChild query ID.
parent_query_idOptional parent query ID. If NULL, all
parents for query_id are removed.
A data.table of remaining dependency edges for the query.
query_graph()
List stored query dependency edges.
EsgStore$query_graph(
query_id = NULL,
direction = c("children", "parents", "both"),
recursive = TRUE
)query_idOptional query ID anchor.
directionWhich edge direction to return for an anchor.
recursiveWhether to include transitive edges.
A data.table of dependency edges.
queries()
List stored ESGF queries.
EsgStore$queries(tracked = NULL)
trackedOptional tracked-state filter.
A data.table of stored query records.
query_files()
List files linked to a stored ESGF query.
EsgStore$query_files(query_id, status = NULL)
query_idQuery ID returned by $add_query().
statusOptional query-file status filter.
A data.table of linked file records.
preview_update_queries()
Preview tracked ESGF query updates without changing the store.
EsgStore$preview_update_queries( query_id = NULL, tracked = TRUE, tag = NULL, children = FALSE, detail = FALSE, all = TRUE, limit = FALSE, fields = "*", ... )
query_idOptional query ID. If NULL, tracked queries are
previewed by default.
trackedTracked-state filter used when query_id is NULL.
tagOptional query tag filter used when query_id is NULL.
childrenWhether to include dependency children of selected queries.
detailWhether to return per-file changes together with the
summary. Default: FALSE.
all, limit, fieldsArguments passed to EsgQuery$collect().
...Additional File query filters passed to EsgQuery$collect().
A data.table summary, or a list with summary and changes
when detail = TRUE.
update_queries()
Refresh stored ESGF queries and link their current File records.
EsgStore$update_queries(
query_id = NULL,
tracked = TRUE,
tag = NULL,
children = FALSE,
enqueue = FALSE,
downloader = NULL,
replica = "auto",
session_label = NULL,
service = "HTTPServer",
probe = TRUE,
probe_concurrency = NULL,
probe_cache_seconds = 3600L,
strategy = c("fastest", "first", "stable"),
all = TRUE,
limit = FALSE,
fields = "*",
...
)query_idOptional query ID. If NULL, tracked queries are
updated by default.
trackedTracked-state filter used when query_id is NULL.
tagOptional query tag filter used when query_id is NULL.
childrenWhether to include dependency children of selected queries.
enqueueWhether to enqueue current files after updating.
Default: FALSE.
downloaderOptional Downloader used when enqueue = TRUE.
replicaReplica policy passed to $download_plan() when
enqueuing.
session_labelOptional download session label.
serviceESGF URL service used for the download plan.
probeWhether to probe candidate URLs before ranking.
probe_concurrencyMaximum concurrent URL probes when
probe = TRUE. Default comes from the downloader worker count
when enqueue = TRUE.
probe_cache_secondsSeconds to reuse fresh data-node probe
history before probing a URL again. Default: 3600.
strategyCandidate ranking strategy.
all, limit, fieldsArguments passed to EsgQuery$collect().
...Additional File query filters passed to EsgQuery$collect().
A data.table of query-file links touched by the update.
download_preflight()
Preview a tracked query download without changing the store.
EsgStore$download_preflight(
query_id,
downloader = NULL,
replica = "auto",
service = "HTTPServer",
probe = TRUE,
probe_concurrency = NULL,
probe_cache_seconds = 3600L,
strategy = c("fastest", "first", "stable"),
all = TRUE,
limit = FALSE,
fields = "*",
...
)query_idQuery ID returned by $add_query().
downloaderOptional Downloader used only for node history, network policy, and cooldown policy.
replicaReplica policy passed to $download_plan().
service, probe, strategyDownload plan arguments.
probe_concurrencyMaximum concurrent URL probes when
probe = TRUE. Default comes from downloader when supplied.
probe_cache_secondsSeconds to reuse fresh data-node probe
history before probing a URL again. Default: 3600.
all, limit, fieldsArguments passed to EsgQuery$collect().
...Additional File query filters passed to EsgQuery$collect().
A list with summary, changes, files, and candidates.
download_query()
Refresh, enqueue, and optionally run downloads for a stored ESGF query.
EsgStore$download_query(
query_id,
downloader = NULL,
replica = "auto",
dry_run = FALSE,
run = TRUE,
background = FALSE,
mode = c("process", "daemon"),
session_label = NULL,
service = "HTTPServer",
probe = TRUE,
probe_concurrency = NULL,
probe_cache_seconds = 3600L,
strategy = c("fastest", "first", "stable"),
progress = TRUE,
overwrite = FALSE,
resume = TRUE,
all = TRUE,
limit = FALSE,
fields = "*",
...
)query_idQuery ID returned by $add_query().
downloaderOptional Downloader. Default: $downloader().
replicaReplica policy passed to $download_plan().
dry_runWhether to return a download preflight without
changing the store, enqueueing, or downloading. Default:
FALSE.
runWhether to run the queued session immediately. Default:
TRUE.
backgroundWhether to run the queued session in the background.
Default: FALSE.
modeBackground execution mode. "process" starts a detached
Rscript; "daemon" submits the job to a running downloader
daemon.
session_labelOptional download session label.
service, probe, strategyDownload plan arguments.
probe_concurrencyMaximum concurrent URL probes when
probe = TRUE. Default comes from the downloader worker count.
probe_cache_secondsSeconds to reuse fresh data-node probe
history before probing a URL again. Default: 3600.
progress, overwrite, resumeRun arguments.
all, limit, fieldsArguments passed to EsgQuery$collect().
...Additional File query filters passed to EsgQuery$collect().
The created downloader session ID, NA_character_ when there
is no pending file to download, or a one-row background job
record when run = TRUE and background = TRUE.
download_status()
Return downloader tasks linked to stored query files.
EsgStore$download_status(query_id = NULL, session_id = NULL, downloader = NULL)
query_idOptional stored query ID.
session_idOptional downloader session ID.
downloaderOptional Downloader. Default: $downloader().
A data.table of downloader task rows.
query_status()
Summarise tracked ESGF query file and download status.
EsgStore$query_status(query_id = NULL, downloader = NULL)
query_idOptional stored query ID vector. If NULL, all
stored ESGF queries are summarised.
downloaderOptional Downloader. Default: $downloader().
A data.table with one row per stored query.
query_updates()
List tracked query update runs.
EsgStore$query_updates(query_id = NULL, latest = FALSE)
query_idOptional stored query ID filter.
latestWhether to return only the latest update per query.
A data.table of update run summaries.
query_changes()
List per-file changes recorded by tracked query updates.
EsgStore$query_changes(update_id = NULL, query_id = NULL, change_type = NULL)
update_idOptional update run ID filter.
query_idOptional stored query ID filter.
change_typeOptional change type filter.
A data.table of per-file query update changes.
workflow_status()
Summarise query, download, local, and extraction status together.
EsgStore$workflow_status(query_id = NULL, downloader = NULL)
query_idOptional stored query ID filter.
downloaderOptional Downloader. Default: $downloader().
A data.table with one row per stored query.
workflow_report()
Return a compact ESGF query workflow health report.
EsgStore$workflow_report(query_id = NULL, downloader = NULL)
query_idOptional stored query ID filter.
downloaderOptional Downloader. Default: $downloader().
A list with summary, updates, changes, downloads,
and nodes.
remove_query()
Remove stored ESGF queries and optionally delete orphaned local files.
EsgStore$remove_query(query_id, delete = c("none", "orphaned"))query_idStored query ID vector.
deleteWhether to leave local files untouched ("none") or
delete files orphaned by the removal ("orphaned").
A data.table describing removed queries.
remove_files()
Remove ESGF file records and optionally delete local artifacts.
EsgStore$remove_files(file_key, delete_local = FALSE, force = FALSE)
file_keyFile key vector.
delete_localWhether to delete local NetCDF files. Default:
FALSE.
forceWhether to remove files still linked to queries.
Default: FALSE.
A data.table describing removed file records.
prune_orphans()
Report or remove file records no longer linked to any query.
EsgStore$prune_orphans(delete_local = FALSE)
delete_localWhether to delete local NetCDF files and remove
orphaned registry records. Default: FALSE.
A data.table of orphaned file records.
storage_report()
Summarise store download storage, registered local assets, temporary files, and cleanup candidates.
EsgStore$storage_report(detail = FALSE)
detailWhether to return detailed file tables. Default:
FALSE.
A summary data.table, or a list when detail = TRUE.
validate_files()
Validate store-managed NetCDF downloads against the manifest.
EsgStore$validate_files(query_id = NULL, checksum = FALSE, layout = TRUE)
query_idOptional stored query IDs to validate. When NULL,
all known downloaded ESGF files are checked.
checksumWhether to compute file checksums. Default: FALSE.
layoutWhether to compare registered files with the current
download layout policy. Default: TRUE.
A list with summary, files, artifacts, untracked, and
actions data.tables. The method is read-only.
repair_files()
Repair safe store download inconsistencies reported by
$validate_files().
EsgStore$repair_files(actions = NULL, dry_run = TRUE)
actionsOptional action table from $validate_files()$actions.
When NULL, actions are generated from $validate_files().
dry_runWhether to only report planned repairs. Default:
TRUE.
A data.table describing attempted repairs.
cleanup_downloads()
Report or remove download cleanup candidates.
EsgStore$cleanup_downloads(
scope = c("tmp", "orphan_records", "untracked_files", "missing_records"),
dry_run = TRUE,
older_than = NULL
)scopeCleanup scopes. Supported values are "tmp",
"orphan_records", "untracked_files", and
"missing_records".
dry_runWhether to only report cleanup candidates. Default:
TRUE.
older_thanOptional age filter for file scopes. A numeric value
is interpreted as seconds before now; a POSIXct value is used
as an absolute mtime cutoff.
A data.table describing cleanup candidates or removals.
retry_downloads()
Requeue retryable downloader tasks linked to stored query files.
EsgStore$retry_downloads(
query_id = NULL,
session_id = NULL,
downloader = NULL,
status = c("error", "cancelled"),
run = TRUE,
...
)query_idOptional stored query ID.
session_idOptional downloader session ID.
downloaderOptional Downloader. Default: $downloader().
statusRetryable statuses. Default: c("error", "cancelled").
runWhether to run requeued tasks immediately. Default: TRUE.
...Additional arguments passed to Downloader$run().
A data.table of matching task rows after retry handling.
add_files()
Add File or Aggregation query results to the local file catalog.
EsgStore$add_files(files, label = NULL)
filesAn EsgResultFile or EsgResultAggregation object.
labelOptional label for this query run.
The created or updated query ID.
downloader()
Return a Downloader bound to this store.
EsgStore$downloader(...)
...Additional arguments passed to Downloader$new().
A Downloader object.
download_files()
Enqueue and optionally download ESGF file records through the store downloader.
EsgStore$download_files(
files = NULL,
query_id = NULL,
replica = "auto",
downloader = NULL,
run = TRUE,
background = FALSE,
mode = c("process", "daemon"),
session_label = NULL,
service = "HTTPServer",
probe = TRUE,
probe_concurrency = NULL,
probe_cache_seconds = 3600L,
strategy = c("fastest", "first", "stable"),
progress = TRUE,
overwrite = FALSE,
resume = TRUE,
...
)filesOptional EsgResultFile or EsgResultAggregation object. If supplied, it is cataloged before the download plan is created.
query_idOptional file collection query IDs to enqueue when
files is NULL. If NULL, all cataloged files missing local
paths are considered.
replicaReplica policy passed to $download_plan().
downloaderOptional Downloader. Default: $downloader().
runWhether to run the queued session immediately. Default: TRUE.
backgroundWhether to run the queued session in the background.
Default: FALSE.
modeBackground execution mode. "process" starts a detached
Rscript; "daemon" submits the job to a running downloader
daemon.
session_labelOptional download session label.
serviceESGF URL service to download from. Default:
"HTTPServer".
probeWhether to lightly probe URLs before ranking them.
probe_concurrencyMaximum concurrent URL probes when
probe = TRUE. Default comes from the downloader worker count.
probe_cache_secondsSeconds to reuse fresh data-node probe
history before probing a URL again. Default: 3600.
strategyCandidate ranking strategy.
progressWhether to show per-file download progress.
overwriteWhether to overwrite existing final files.
resumeWhether to resume interrupted .part files.
...Additional arguments passed to $download_plan() and
Downloader$run().
The created downloader session ID, or a one-row background
job record when run = TRUE and background = TRUE.
sync_downloads()
Register completed downloader tasks as local store artifacts.
EsgStore$sync_downloads(downloader = NULL)
downloaderOptional Downloader. Default: $downloader().
A data.table of completed tasks.
plan_region()
Plan regional extraction jobs from cataloged files.
EsgStore$plan_region( query_id, lon, lat, time, site_id = "site-1", variable_id = NULL, filters = list(), method = "nearest" )
query_idQuery ID returned by $add_files().
lon, latTarget longitude and latitude.
timeLength-2 time range.
site_idSite identifier. Default: "site-1".
variable_idOptional variable IDs. If NULL, all cataloged
variables in the query are used.
filtersNamed list of exact-match file catalog filters.
methodGrid extraction method. One of "nearest", "idw",
"bilinear", or "mean". Default: "nearest".
A data.table of extraction plan rows.
extract()
Execute pending or failed regional extraction plans.
EsgStore$extract(
plan_id = NULL,
status = c("pending", "failed"),
fallback = c("auto", "error"),
overwrite = FALSE,
resume = TRUE
)plan_idOptional plan IDs to run.
statusPlan statuses to run when plan_id is NULL.
Default: c("pending", "failed").
fallbackWhat to do when OPeNDAP is unavailable. "auto"
downloads through HTTPServer when possible; "error" marks
the plan failed without downloading. Default: "auto".
overwriteIf TRUE, overwrite existing Parquet outputs.
Default: FALSE.
resumeWhether to reuse complete existing extraction outputs.
Default: TRUE.
A data.table of processed extraction plan rows.
query()
Run a DuckDB SQL query against the extraction manifest.
EsgStore$query(sql)
sqlSQL query.
A data.table.
summarise()
Summarise extracted Parquet outputs by manifest columns.
EsgStore$summarise(
by = c("source_id", "experiment_id", "variant_label", "frequency", "variable_id",
"site_id", "year")
)byCharacter vector of grouping columns. Default groups by source, experiment, variant, frequency, variable, site and year.
A data.table.
coverage()
Check extraction coverage for planned jobs.
EsgStore$coverage(plan_id = NULL)
plan_idOptional plan IDs to check.
A data.table with one row per plan.
assert_complete()
Assert that selected extraction plans are complete.
EsgStore$assert_complete(plan_id = NULL)
plan_idOptional plan IDs to check.
The store object itself, invisibly.
clone()
The objects of this class are cloneable with this method.
EsgStore$clone(deep = FALSE)
deepWhether to make a deep clone.
Hongyuan Jia
install_cli() writes a small platform launcher that runs the current
Rscript with epwshiftr::epwshiftr_cli(exit = TRUE). It does not modify
shell profiles or PATH.
install_cli(bin_dir = NULL, name = "epwshiftr", overwrite = FALSE)install_cli(bin_dir = NULL, name = "epwshiftr", overwrite = FALSE)
bin_dir |
Directory where the launcher should be written. Defaults to
|
name |
Launcher command name. Default: |
overwrite |
Whether to replace an existing launcher. Default: |
A data.table describing the installed launcher.
is.solr_date() returns TRUE when x is a SolrDate object created by
solr_date() or returned unchanged from it.
is.solr_date(x)is.solr_date(x)
x |
An object to test. |
A single logical value.
is.solr_date(solr_date("2025")) is.solr_date("2025")is.solr_date(solr_date("2025")) is.solr_date("2025")
shift_*() functions provide a stage-oriented workflow facade over
EsgQuery, EsgStore, Downloader, and EpwMorpher. Each step returns a
small S7 stage object that can be printed, inspected, saved, and passed to the
next step without manually passing manifest IDs.
shift_request( provider = "esgf", project = NULL, source = NULL, experiment = NULL, variant = NULL, variables = NULL, frequency = NULL, time = NULL, filters = list(), options = list(), ... ) shift_site( id = NULL, lon = NULL, lat = NULL, label = NULL, epw = NULL, metadata = list() ) shift_reference_plan(plan_id, periods) shift_reference_historical( periods, experiment = "historical", activity = "CMIP", match = c("source_id", "variant_label", "frequency", "table_id"), filters = list(), options = list(), collect = list(), extract = list(fallback = "auto") ) shift_collect( x, store = NULL, fields = "*", all = TRUE, limit = FALSE, label = NULL, ... ) shift_download( x, downloader = NULL, run = TRUE, background = FALSE, resume = TRUE, overwrite = FALSE, session_label = NULL, ... ) shift_extract( x, site = NULL, periods = NULL, variables = NULL, time = NULL, filters = list(), method = "nearest", fallback = c("auto", "error"), overwrite = FALSE, resume = TRUE ) shift_morph( x, baseline = NULL, recipe = epw_morph_recipe("belcher"), reference = NULL, reference_plan_id = NULL, reference_periods = NULL, strict = TRUE, by = c("source_id", "experiment_id", "variant_label", "period"), overwrite = FALSE, resume = TRUE ) shift_epw(x, dir = NULL, separate = TRUE, overwrite = FALSE, resume = TRUE) shift_check(x, strict = FALSE, ...) shift_refresh(x) shift_ids(x) shift_datasets(x, all = TRUE, limit = FALSE) shift_files(x) shift_data(x, n = 100L, variables = NULL, case_id = NULL, columns = NULL) shift_diagnostics(x, severity = NULL) shift_store(x, create = FALSE) shift_target(x) shift_coverage(x) shift_outputs(x) shift_artifacts(x) shift_status(x)shift_request( provider = "esgf", project = NULL, source = NULL, experiment = NULL, variant = NULL, variables = NULL, frequency = NULL, time = NULL, filters = list(), options = list(), ... ) shift_site( id = NULL, lon = NULL, lat = NULL, label = NULL, epw = NULL, metadata = list() ) shift_reference_plan(plan_id, periods) shift_reference_historical( periods, experiment = "historical", activity = "CMIP", match = c("source_id", "variant_label", "frequency", "table_id"), filters = list(), options = list(), collect = list(), extract = list(fallback = "auto") ) shift_collect( x, store = NULL, fields = "*", all = TRUE, limit = FALSE, label = NULL, ... ) shift_download( x, downloader = NULL, run = TRUE, background = FALSE, resume = TRUE, overwrite = FALSE, session_label = NULL, ... ) shift_extract( x, site = NULL, periods = NULL, variables = NULL, time = NULL, filters = list(), method = "nearest", fallback = c("auto", "error"), overwrite = FALSE, resume = TRUE ) shift_morph( x, baseline = NULL, recipe = epw_morph_recipe("belcher"), reference = NULL, reference_plan_id = NULL, reference_periods = NULL, strict = TRUE, by = c("source_id", "experiment_id", "variant_label", "period"), overwrite = FALSE, resume = TRUE ) shift_epw(x, dir = NULL, separate = TRUE, overwrite = FALSE, resume = TRUE) shift_check(x, strict = FALSE, ...) shift_refresh(x) shift_ids(x) shift_datasets(x, all = TRUE, limit = FALSE) shift_files(x) shift_data(x, n = 100L, variables = NULL, case_id = NULL, columns = NULL) shift_diagnostics(x, severity = NULL) shift_store(x, create = FALSE) shift_target(x) shift_coverage(x) shift_outputs(x) shift_artifacts(x) shift_status(x)
provider |
Climate data provider. The first implementation supports
|
project |
Optional provider project, for example |
source, experiment, variant, frequency
|
Provider-neutral request aliases.
In |
variables |
Provider-neutral request alias in |
time |
Optional request or extraction time filter. Numeric years such as
|
filters |
Provider-specific query filters in |
options |
Provider-specific request options. For ESGF, |
... |
Additional provider-specific filters or workflow options. |
id |
Optional site identifier. If |
lon, lat
|
Optional site longitude and latitude. Missing values are read
from |
label |
Optional label recorded with collected File records. |
epw |
Optional baseline EPW path or eplusr::Epw object. |
metadata |
Optional site metadata. |
plan_id |
Store extraction plan IDs for manually selected reference climate data. |
periods |
A period table, usually from |
activity |
Historical reference activity filter used by
|
match |
File metadata fields copied from the future climate stage when resolving an automatic historical reference. |
collect, extract
|
Named option lists passed to the automatic
historical collect and extract steps. |
x |
A shift stage object. |
store |
An EsgStore, store path, or |
fields |
File fields collected from Dataset records. The default requests all fields and lets the result/store layers preserve and validate provider response metadata. |
all, limit
|
Collection controls passed to EsgQuery / EsgResultDataset. |
downloader |
Optional Downloader instance. |
run |
Whether to run queued downloads immediately. Downloading full
NetCDF files is optional for the normal workflow because |
background |
Whether to run downloads in a background job. |
resume |
Whether to reuse complete existing downloads, extraction outputs, morphing results, or EPW outputs. |
overwrite |
Whether to overwrite existing downloads, extraction outputs, morphing results, or EPW outputs. |
session_label |
Optional download session label. |
site |
A |
method |
Grid extraction method. |
fallback |
Extraction fallback policy. |
baseline |
Optional baseline EPW path, eplusr::Epw object, or
|
recipe |
Morphing recipe, usually from |
reference |
Optional reference |
reference_plan_id, reference_periods
|
Optional store plan IDs and period table for reference climate data. |
strict |
If |
by |
Grouping columns used to create morphing cases. |
dir |
Store-relative output directory for generated EPW files. If |
separate |
Whether to create separate output directories per morphing case. |
n |
Maximum number of data rows to read. Use |
case_id |
Optional morphing case IDs to read from morphed or EPW output stages. |
columns |
Optional data columns to keep. |
severity |
Optional diagnostic severities to keep. |
create |
Whether to create a store when |
A shift stage object.
solr_date() parses a scalar input into an internal S7 SolrDate object.
The resulting object can represent a single instant, a Date Math expression,
an unbounded boundary (*), or a Solr range.
solr_date(x)solr_date(x)
x |
A scalar input to parse. Supported inputs are:
|
Character inputs support the following forms:
Simplified dates such as "2025", "2025-02", "2025-02-03", and
"20250203".
Datetimes accepted by the internal parser, including ISO-like forms such as
"2025-01-15T12:30:45Z", timezone offsets like "+08:00", and common
separators such as "/" and ".".
Solr Date Math expressions rooted at NOW, e.g. "NOW",
"NOW-1YEAR", or "NOW/DAY-1YEAR+6MONTHS".
Fixed-base Date Math expressions of the form
"<datetime>Z<math>", e.g. "2025-01-01T00:00:00Z+1MONTH".
Solr range expressions using the exact separator " TO " and boundary
brackets \[\] or \{\}, e.g. "\[2000 TO 2010\]", "\{2000 TO 2010\]", or
"\[* TO *\]".
Supported Date Math operators are +, -, and /. Supported units are
YEAR, YEARS, MONTH, MONTHS, DAY, DAYS, DATE, HOUR, HOURS,
MINUTE, MINUTES, SECOND, SECONDS, MILLI, MILLIS, MILLISECOND,
and MILLISECONDS.
Use format() or as.character() to render a parsed value. format()
supports as = "iso" and as = "num". as.POSIXct() can be used on
instants; for ranges it returns the start boundary with a warning, and for
unbounded or Date Math values it errors because no single concrete instant is
available.
An internal S7 object inheriting from SolrDate. The exact
subclass is an implementation detail and may represent a single instant,
a Date Math expression, an unbounded boundary, or a range.
solr_date("2025") solr_date("2025-02") solr_date("20250203") solr_date("2025-01-15T12:30:45Z") solr_date("NOW") solr_date("NOW/DAY-1YEAR+6MONTHS") solr_date("2025-01-01T00:00:00Z+1MONTH") solr_date("[2000 TO 2010]") solr_date("{2000 TO 2010]") solr_date("[* TO *]") x <- solr_date("2025-01-15T12:30:45Z") format(x) format(x, as = "num") as.character(x) as.POSIXct(x) is.solr_date(x) print(x)solr_date("2025") solr_date("2025-02") solr_date("20250203") solr_date("2025-01-15T12:30:45Z") solr_date("NOW") solr_date("NOW/DAY-1YEAR+6MONTHS") solr_date("2025-01-01T00:00:00Z+1MONTH") solr_date("[2000 TO 2010]") solr_date("{2000 TO 2010]") solr_date("[* TO *]") x <- solr_date("2025-01-15T12:30:45Z") format(x) format(x, as = "num") as.character(x) as.POSIXct(x) is.solr_date(x) print(x)
store_dir() returns the root directory used for persistent epwshiftr store
artifacts, including query snapshots, dictionaries, sources, downloads,
extracted data, generated outputs, and the DuckDB manifest.
store_dir(init = TRUE)store_dir(init = TRUE)
init |
If |
A single string indicating the directory location.
Remove a launcher generated by install_cli().
uninstall_cli(bin_dir = NULL, name = "epwshiftr")uninstall_cli(bin_dir = NULL, name = "epwshiftr")
bin_dir |
Directory where the launcher should be written. Defaults to
|
name |
Launcher command name. Default: |
A data.table describing the uninstall result.