Title: | Create Future 'EnergyPlus' Weather Files using 'CMIP6' Data |
---|---|
Description: | Query, download climate change projection data from the 'CMIP6' (Coupled Model Intercomparison Project Phase 6) project <https://pcmdi.llnl.gov/CMIP6/> in the 'ESGF' (Earth System Grid Federation) platform <https://esgf.llnl.gov>, and create future 'EnergyPlus' <https://energyplus.net> Weather ('EPW') files adjusted from climate changes using data from Global Climate Models ('GCM'). |
Authors: | Hongyuan Jia [aut, cre] |
Maintainer: | Hongyuan Jia <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.4.9001 |
Built: | 2025-02-11 04:28:04 UTC |
Source: | https://github.com/ideas-lab-nus/epwshiftr |
Query, download climate change projection data from the CMIP6 (Coupled Model Intercomparison Project Phase 6) project in the ESGF (Earth System Grid Federation) platform, and create future EnergyPlus Weather (EPW) files adjusted from climate changes using data from Global Climate Models (GCM).
epwshiftr.verbose
: If TRUE
, more detailed message will be printed.
Default: FALSE
.
epwshiftr.threshold_alpha
: the threshold of the absolute value for alpha,
i.e. monthly-mean fractional change, when performing morphing operations.
The default value is set to 3
. If the morphing methods are set
"stretch"
or "combined"
, and the absolute alpha exceeds the threshold
value, warnings are issued and the morphing method fallbacks to
"shift"
to avoid unrealistic morphed values.
epwshiftr.dir
: The directory to store package data, including CMIP6
model output file index and etc. If not set, the current user data
directory will be used.
Hongyuan Jia
Useful links:
Report bugs at https://github.com/ideas-lab-nus/epwshiftr/issues
The Cmip6Dict
object provides functionalities to fetch the latest CMIP6
Controlled Vocabularies (CVs) and Data Request (DReq) information.
cmip6_dict()
cmip6_dict()
The CMIP6 CVs gives a well-defined set of global attributes that are recorded in each CMIP6 model output, providing information necessary for interpreting the data. The data of CMIP6 CVs is stored as JSON files in the WCRP-CMIP GitHub Repo.
The CMIP6 DReq defines all the quantities from CMIP6 simulations that should
be archived. This includes both quantities of general interest needed from
most of the CMIP6-endorsed model intercomparison projects (MIPs) and
quantities that are more specialized and only of interest to a single
endorsed MIP. The raw data of DReq is stored a Microsoft Excel file
(CMIP6_MIP_tables.xlsx
) in a Subversion repo.
The Cmip6Dict
object uses the parsed DReq data that is stored in the
GitHub Repo.
For more information, please see:
version()
Get the version of CVs and Data Request
Cmip6Dict$version()
A list of two element:
cvs
: A numeric_version object giving the version of CVs
dreq
: A numeric_version object giving the version of Data
Request
is_empty()
Is it an empty Cmip6Dict?
$is_empty()
checks if this Cmip6Dict
is empty, i.e. the $build()
or $load()
method hasn't been called yet and there is no data of
CVs and Data Request.
Cmip6Dict$is_empty()
A single logical value of TRUE
or FALSE
.
timestamp()
Get the last modified time for CVs
Cmip6Dict$timestamp()
A list of 14 DateTimes:
"cvs"
: The last modified time for the whole CV collection
"drs"
: The last modified time for Data Reference Syntax (DRS)
"activity_id"
: The last modified time for Activity ID
"experiment_id"
: The last modified time for Experiment ID
"frequency"
: The last modified time for Frequency
"grid_label"
: The last modified time for Grid Label
"institution_id"
: The last modified time for Institution ID
"nominal_resolution"
: The last modified time for Nominal Resolution
"realm"
: The last modified time for Realm
"required_global_attributes"
: The last modified time for Required Global Attributes
"source_id"
: The last modified time for Source ID
"source_type"
: The last modified time for Source Type
"sub_experiment_id"
: The last modified time for Sub-Experiment ID
"table_id"
: The last modified time for Table ID
built_time()
Get the time when the dictionary was built
Cmip6Dict$built_time()
A DateTime
build()
Fetch and parse all data of CVs and Data Request
Cmip6Dict$build(token = NULL, force = FALSE)
token
A string of GitHub token that is used to access GitHub
REST APIs. If NULL
, GITHUB_PAT
or GITHUB_TOKEN
environment variable will be used if exists. Default: NULL
.
force
Whether to force to rebuild the dict when it has been
already built before. Default: FALSE
.
The updated Cmip6Dict
itself.
get()
Get the data for a specific CV or Data Request
Cmip6Dict$get(type)
type
A single string indicating the type of data to list. Should be one of:
"drs"
: Data Reference Syntax (DRS)
"activity_id"
: Activity ID
"experiment_id"
: Experiment ID
"frequency"
: Frequency
"grid_label"
: Grid Label
"institution_id"
: Institution ID
"nominal_resolution"
: Nominal Resolution
"realm"
: Realm
"required_global_attributes"
: Required Global Attributes
"source_id"
: Source ID
"source_type"
: Source Type
"sub_experiment_id"
: Sub-Experiment ID
"table_id"
: Table ID
"dreq"
: Data Request
For "drs"
, "activity_id",
"frequency",
"grid_label",
"institution_id",
"source_type"and
"sub_experiment_id"', a
list.
For "experiment_id"
, "source_id"
and "dreq"
, a data.table.
For "nominal_resolution"
, "required_global_attributes"
and
"table_id"
, a character vector.
save()
Save the Cmip6Dict object
$save()
stores all the core data of current Cmip6Dict
object into
an RDS file named CMIP6DICT
in the specified folder.
This file can be reloaded via $load()
method to restore the last
state of current Cmip6Dict
object.
Cmip6Dict$save(dir = getOption("epwshiftr.dir", "."))
dir
A single string giving the directory to save the RDS file.
Default is set to the global option epwshiftr.dir
. The
directory will be created if not exists. If this global option
is not set, the current working directory is used.
A single string giving the full path of the RDS file.
load()
Load the saved Cmip6Dict object from file
$load()
loads the RDS file named CMIP6DICT
that is created using
$save()
method.
Please note that the file should be exactly the same as CMIP6DICT
without file extension.
Cmip6Dict$load(dir = getOption("epwshiftr.dir", "."))
dir
A single string giving the directory to find the RDS file.
Default is set to the global option epwshiftr.dir
. If this
global option is not set, the current working directory is
used.
A single string giving the full path of the RDS file.
print()
Print a summary of the current Cmip6Dict
object
$print()
gives the summary of current Cmip6Dict
object including
the version of CVs and Data Request, and the last built time.
Cmip6Dict$print()
The Cmip6Dict
object itself, invisibly.
Hongyuan Jia
## Not run: # create a new Cmip6Dict object dict <- cmip6_dict() # by default, there is no data when the Cmip6Dict was created dict$is_empty() # fetch and parse all CVs and Data Request data dict$build() # get the version of CVs nand Data Request dict$version() # get the last modified time for each CV and Data Request dict$timestamp() # get the time when the dict was built dict$built_time() # get the data of CVs and DReq dict$get("activity_id") dict$get("experiment_id") dict$get("sub_experiment_id") dict$get("institution_id") dict$get("source_id") dict$get("table_id") dict$get("frequency") dict$get("grid_label") dict$get("realm") dict$get("source_type") dict$get("dreq") # save the dict object for later usage # default location is the value of global option "epwshiftr.dir" dict$save() # the saved dict object can be reloaded new_dict <- cmip6_dict() new_dict$load() # print will show the version summary and the last built time dict$print() ## End(Not run)
## Not run: # create a new Cmip6Dict object dict <- cmip6_dict() # by default, there is no data when the Cmip6Dict was created dict$is_empty() # fetch and parse all CVs and Data Request data dict$build() # get the version of CVs nand Data Request dict$version() # get the last modified time for each CV and Data Request dict$timestamp() # get the time when the dict was built dict$built_time() # get the data of CVs and DReq dict$get("activity_id") dict$get("experiment_id") dict$get("sub_experiment_id") dict$get("institution_id") dict$get("source_id") dict$get("table_id") dict$get("frequency") dict$get("grid_label") dict$get("realm") dict$get("source_type") dict$get("dreq") # save the dict object for later usage # default location is the value of global option "epwshiftr.dir" dict$save() # the saved dict object can be reloaded new_dict <- cmip6_dict() new_dict$load() # print will show the version summary and the last built time dict$print() ## End(Not run)
Query CMIP6 data using ESGF search RESTful API
esgf_query( activity = "ScenarioMIP", variable = c("tas", "tasmax", "tasmin", "hurs", "hursmax", "hursmin", "pr", "rsds", "rlds", "psl", "sfcWind", "clt"), frequency = "day", experiment = c("ssp126", "ssp245", "ssp370", "ssp585"), source = c("AWI-CM-1-1-MR", "BCC-CSM2-MR", "CESM2", "CESM2-WACCM", "EC-Earth3", "EC-Earth3-Veg", "GFDL-ESM4", "INM-CM4-8", "INM-CM5-0", "MPI-ESM1-2-HR", "MRI-ESM2-0"), variant = "r1i1p1f1", replica = FALSE, latest = TRUE, resolution = c("100 km", "50 km"), type = "Dataset", limit = 10000L, data_node = NULL )
esgf_query( activity = "ScenarioMIP", variable = c("tas", "tasmax", "tasmin", "hurs", "hursmax", "hursmin", "pr", "rsds", "rlds", "psl", "sfcWind", "clt"), frequency = "day", experiment = c("ssp126", "ssp245", "ssp370", "ssp585"), source = c("AWI-CM-1-1-MR", "BCC-CSM2-MR", "CESM2", "CESM2-WACCM", "EC-Earth3", "EC-Earth3-Veg", "GFDL-ESM4", "INM-CM4-8", "INM-CM5-0", "MPI-ESM1-2-HR", "MRI-ESM2-0"), variant = "r1i1p1f1", replica = FALSE, latest = TRUE, resolution = c("100 km", "50 km"), type = "Dataset", limit = 10000L, data_node = NULL )
activity |
A character vector indicating activity identifiers. Default:
|
variable |
A character vector indicating variable identifiers. The 12
most related variables for EPW are set as defaults. If
|
frequency |
A character vector of sampling frequency. If
|
experiment |
A character vector indicating root experiment identifiers.
The Tier-1 experiment of activity ScenarioMIP are set as defaults.
If |
source |
A character vector indicating model identifiers. Defaults are
set to 11 sources which give outputs of all 4 experiment of activity
ScenarioMIP with daily frequency, i.e. |
variant |
A character vector indicating label constructed from 4
indices stored as global attributes in format
|
replica |
Whether the record is the "master" copy, or a replica. Use
|
latest |
Whether the record is the latest available version, or a
previous version. Use |
resolution |
A character vector indicating approximate horizontal
resolution. Default: |
type |
A single string indicating the intrinsic type of the record.
Should be either |
limit |
An integer indicating the maximum of matched records to return.
Should be <= 10,000. Default: |
data_node |
A character vector indicating data nodes to be queried.
Default to |
The Earth System Grid Federation (ESGF) is an international collaboration for the software that powers most global climate change research, notably assessments by the Intergovernmental Panel on Climate Change (IPCC).
The ESGF search service exposes a RESTful URL that can be used by clients to
query the contents of the underlying search index, and return results
matching the given constraints. With the distributed capabilities of the ESGF
search, the URL at any Index Node can be used to query that Node only, or all
Nodes in the ESGF system. esgf_query()
uses the
LLNL (Lawrence Livermore National Laboratory) Index Node.
The core Controlled Vocabularies (CVs) for use in CMIP6, including all activities, experiment, sources (GCMs), frequencies can be found at the WCRP-CMIP/CMIP6_CVs GitHub repo.
A data.table::data.table with an attribute named response
which
is a list converted from json response. If no matched data is found, an empty
data.table is returned. Otherwise, the columns of returned data varies based
on the type
:
If "Dataset"
, returned columns are:
No. | Column | Type | Description |
1 | dataset_id |
Character | Dataset universal identifier |
2 | mip_era |
Character | Activity's associated CMIP cycle. Will always be "CMIP6" |
3 | activity_drs |
Character | Activity DRS (Data Reference Syntax) |
4 | institution_id |
Character | Institution identifier |
5 | source_id |
Character | Model identifier |
6 | experiment_id |
Character | Root experiment identifier |
7 | member_id |
Character | A compound construction from sub_experiment_id and variant_label |
8 | table_id |
Character | Table identifier, i.e. sampling frequency identifier |
9 | frequency |
Character | Sampling frequency |
10 | grid_label |
Character | Grid identifier |
11 | version |
Character | Approximate date of model output file |
12 | nominal_resolution |
Character | Approximate horizontal resolution |
13 | variable_id |
Character | Variable identifier |
14 | variable_long_name |
Character | Variable long name |
15 | variable_units |
Character | Units of variable |
16 | data_node |
Character | Data node to download the model output file |
17 | dataset_pid |
Character | A unique string that helps identify the dataset |
If "File"
, returned columns are:
No. | Column | Type | Description |
1 | file_id |
Character | Model output file universal identifier |
2 | dataset_id |
Character | Dataset universal identifier |
3 | mip_era |
Character | Activity's associated CMIP cycle. Will always be "CMIP6" |
4 | activity_drs |
Character | Activity DRS (Data Reference Syntax) |
5 | institution_id |
Character | Institution identifier |
6 | source_id |
Character | Model identifier |
7 | experiment_id |
Character | Root experiment identifier |
8 | member_id |
Character | A compound construction from sub_experiment_id and variant_label |
9 | table_id |
Character | Table identifier, i.e. sampling frequency identifier |
10 | frequency |
Character | Sampling frequency |
11 | grid_label |
Character | Grid identifier |
12 | version |
Character | Approximate date of model output file |
13 | nominal_resolution |
Character | Approximate horizontal resolution |
14 | variable_id |
Character | Variable identifier |
15 | variable_long_name |
Character | Variable long name |
16 | variable_units |
Character | Units of variable |
17 | datetime_start |
POSIXct | Start date and time of simulation |
18 | datetime_end |
POSIXct | End date and time of simulation |
19 | file_size |
Character | Model output file size in Bytes |
20 | data_node |
Character | Data node to download the model output file |
21 | file_url |
Character | Model output file download url from HTTP server |
22 | tracking_id |
Character | A unique string that helps identify the output file |
https://github.com/ESGF/esgf.github.io/wiki/ESGF_Search_REST_API
## Not run: esgf_query(variable = "rss", experiment = "ssp126", resolution = "100 km", limit = 1) esgf_query(variable = "rss", experiment = "ssp126", type = "File", limit = 1) ## End(Not run)
## Not run: esgf_query(variable = "rss", experiment = "ssp126", resolution = "100 km", limit = 1) esgf_query(variable = "rss", experiment = "ssp126", type = "File", limit = 1) ## End(Not run)
The Earth System Grid Federation (ESGF) is an international collaboration for the software that powers most global climate change research, notably assessments by the Intergovernmental Panel on Climate Change (IPCC).
The ESGF search service exposes RESTful APIs that can be used by clients to query the contents of the underlying search index, and return results matching the given constraints. The documentation of the APIs can be found using this link
EsgfQuery
is the workhorse for dealing with ESGF search services.
query_esgf(host = "https://esgf-node.llnl.gov/esg-search")
query_esgf(host = "https://esgf-node.llnl.gov/esg-search")
host |
The URL to the ESGF Search API service. This should be the URL of
the ESGF search service excluding the final endpoint name. Usually
this is |
EsgfQuery
objectquery_esgf()
returns an EsgfQuery
object, which is an R6
object with quite a few methods that can be classified into 3 categories:
Value listing: methods to list all possible values of facets, shards, etc.
Parameter getter & setter: methods to get the query parameter values or set them before sending the actual query to the ESGF search services.
Query responses: methods to collect results for the query response.
When creating an EsgfQuery
object, a
facet listing query
is sent to the index node to get all available facets and shards for the
default project (CMIP6).
EsgfQuery
object provides three value-listing methods to extract data from
the response of the facet listing query:
EsgfQuery$list_all_facets()
:
List all available facet names.
EsgfQuery$list_all_shards()
:
List all available shards.
EsgfQuery$list_all_values()
:
List all available values of a specific facet.
The ESGF search services support a lot of parameters. The EsgfQuery
contains dedicated methods to set values for most of them, including:
Most common keywords:
facets
,
offset
,
limit
,
fields
,
type
,
replica
,
latest
,
distrib
and
shards
.
Most common facets:
project
,
activity_id
,
experiment_id
,
source_id
,
variable_id
,
frequency
,
variant_label
,
nominal_resolution
and
data_node
.
All methods act in a similar way:
If input is given, the corresponding parameter is set and the updated
EsgfQuery
object is returned.
This makes it possible to chain different parameter setters, e.g.
EsgfQuery$project("CMIP6")$frequency("day")$limit(1)
sets the parameter
project
, frequency
and limit
sequentially.
For parameters that want character inputs, you can put a preceding !
to
negate the constraints, e.g. EsgfQuery$project(!"CMIP6")
searches for
all projects except for CMIP6
.
If no input is given, the current parameter value is returned. For example,
directly calling EsgfQuery$project()
returns the current value of the
project
parameter. The returned value can be two types:
NULL
, i.e. there is no constraint on the corresponding parameter
An EsgfQueryParam
object which is essentially a list of three elements:
value
: The input values
negate
: Whether there is a preceding !
in the input
name
: The parameter name
Despite methods for specific keywords and facets, you can specify arbitrary
query parameters using
EsgfQuery$params()
method. For
details on the usage, please see the
documentation.
The query is not sent unless related methods are called:
EsgfQuery$count()
: Count the total
number of records that match the query.
You can return only the total number of matched record by calling
EsgfQuery$count(facets = FALSE)
You can also count the matched records for specified facets, e.g.
EsgfQuery$count(facets = c("source_id", "activity_id"))
EsgfQuery$collect()
: Collect the
query results and format it into a data.table
EsgfQuery
object also provide several other helper functions:
EsgfQuery$build_cache()
:
By default, EsgfQuery$build_cache()
is called when initialize a new
EsgfQuery
object. So in general, there is no need to call this
separately. Basically, EsgfQuery$build_cahce()
sends a
facet listing query
to the index node and stores the response internally. The response contains
all available facets and shards and is used as a source for validating user
input for parameter setters.
EsgfQuery$url()
: Returns the actual
query URL or the wget script URL which can be used to download all files
matching the given constraints..
EsgfQuery$response()
: Returns the
actual response of
EsgfQuery$count()
and
EsgfQuery$collect()
. It is a named
list generated from the JSON response using jsonlite::fromJSON()
.
EsgfQuery$print()
: Print a summary
of the current EsgfQuery
object including the host URL, the built time of
facet cache and all query parameters.
new()
Create a new EsgfQuery object
When initialization, a
facet listing query
is sent to the index node to get all available facets and shards.
This information will be used to validate inputs for activity_id
,
scource_id
facets and etc.
EsgfQuery$new(host = "https://esgf-node.llnl.gov/esg-search")
host
The URL to the ESGF Search API service. This should be
the URL of the ESGF search service excluding the final
endpoint name. Usually this is http://<hostname>/esg-search
.
Default is to ses the LLNL (Lawrence Livermore National Laboratory) Index Node, which is
"https://esgf-node.llnl.gov/esg-search"
.
An EsgfQuery
object.
\dontrun{ q <- EsgfQuery$new(host = "https://esgf-node.llnl.gov/esg-search") q }
build_cache()
Build facet cache used for input validation
A facet cache is data that is fetched using a facet listing query to the index node. It contains all available facets and shards that can be used as parameter values within a specific project.
By default, $build_cache()
is called when initialize a new
EsgfQuery
object for the default project (CMIP6). So in general,
there is no need to call this method, unless that you want to
rebuild the cache again with different projects after calling
$project()
.
EsgfQuery$build_cache()
The modified EsgfQuery
object.
\dontrun{ q$build_cache() }
list_all_facets()
List all available facet names
EsgfQuery$list_all_facets()
A character vector.
\dontrun{ q$list_all_facets() }
list_all_shards()
List all available shards
EsgfQuery$list_all_shards()
A character vector.
\dontrun{ q$list_all_shards() }
list_all_values()
List all available values of a specific facet
EsgfQuery$list_all_values(facet)
facet
A single string giving the facet name.
A named character vector.
\dontrun{ q$list_all_values() }
project()
Get or set the project
facet parameter.
EsgfQuery$project(value = "CMIP6")
value
The parameter value. Default: "CMIP6"
.
There are two options:
If value
is not given, current value is returned.
A character vector or NULL
. Note that you can put a preceding !
to negate the facet constraints. For example, $project(!c("CMIP5", "CMIP6"))
searches for all project
s except for CMIP5
and CMIP6
.
If value
is given, the modified EsgfQuery
object.
Otherwise, an EsgfQueryParam
object which is essentially a list of three elements:
value
: input values.
negate
: Whether there is a preceding !
.
name
: Parameter name.
\dontrun{ # get current value q$project() # set the parameter q$project("CMIP6") # negate the project constraints q$project(!"CMIP6") # remove the parameter q$project(NULL) }
activity_id()
Get or set the activity_id
facet parameter.
EsgfQuery$activity_id(value)
value
The parameter value. Default: NULL
.
There are two options:
If value
is not given, current value is returned.
A character vector or NULL
. Note that you can put a preceding !
to negate the facet constraints. For example, $activity_id(!c("C4MIP", "GeoMIP"))
searches for all activity_id
s except for C4MIP
and GeoMIP
.
If value
is given, the modified EsgfQuery
object.
Otherwise, an EsgfQueryParam
object which is essentially a list of three elements:
value
: input values.
negate
: Whether there is a preceding !
.
name
: Parameter name.
\dontrun{ # get current value q$activity_id() # set the parameter q$activity_id("ScenarioMIP") # negate the constraints q$activity_id(!c("CFMIP", "ScenarioMIP")) # remove the parameter q$activity_id(NULL) }
experiment_id()
Get or set the experiment_id
facet parameter.
EsgfQuery$experiment_id(value)
value
The parameter value. Default: NULL
.
There are two options:
If value
is not given, current value is returned.
A character vector or NULL
. Note that you can put a preceding !
to negate the facet constraints. For example, $experiment_id(!c("ssp126", "ssp245"))
searches for all experiment_id
s except for ssp126
and ssp245
.
If value
is given, the modified EsgfQuery
object.
Otherwise, an EsgfQueryParam
object which is essentially a list of three elements:
value
: input values.
negate
: Whether there is a preceding !
.
name
: Parameter name.
\dontrun{ # get current value q$experiment_id() # set the parameter q$experiment_id(c("ssp126", "ssp585")) # negate the constraints q$experiment_id(!c("ssp126", "ssp585")) # remove the parameter q$experiment_id(NULL) }
source_id()
Get or set the source_id
facet parameter.
EsgfQuery$source_id(value)
value
The parameter value. Default: NULL
.
There are two options:
If value
is not given, current value is returned.
A character vector or NULL
. Note that you can put a preceding !
to negate the facet constraints. For example, $source_id(!c("CESM2", "CESM2-FV2"))
searches for all source_id
s except for CESM2
and CESM2-FV2
.
If value
is given, the modified EsgfQuery
object.
Otherwise, an EsgfQueryParam
object which is essentially a list of three elements:
value
: input values.
negate
: Whether there is a preceding !
.
name
: Parameter name.
\dontrun{ # get current value q$source_id() # set the parameter q$source_id(c("BCC-CSM2-MR", "CESM2")) # negate the constraints q$source_id(!c("BCC-CSM2-MR", "CESM2")) # remove the parameter q$source_id(NULL) }
variable_id()
Get or set the variable_id
facet parameter.
EsgfQuery$variable_id(value)
value
The parameter value. Default: NULL
.
There are two options:
If value
is not given, current value is returned.
A character vector or NULL
. Note that you can put a preceding !
to negate the facet constraints. For example, $variable_id(!c("tas", "pr"))
searches for all variable_id
s except for tas
and pr
.
If value
is given, the modified EsgfQuery
object.
Otherwise, an EsgfQueryParam
object which is essentially a list of three elements:
value
: input values.
negate
: Whether there is a preceding !
.
name
: Parameter name.
\dontrun{ # get current value q$variable_id() # set the parameter q$variable_id(c("tas", "pr")) # negate the constraints q$variable_id(!c("tas", "pr")) # remove the parameter q$variable_id(NULL) }
frequency()
Get or set the frequency
facet parameter.
EsgfQuery$frequency(value)
value
The parameter value. Default: NULL
.
There are two options:
If value
is not given, current value is returned.
A character vector or NULL
. Note that you can put a preceding !
to negate the facet constraints. For example, $frequency(!c("day", "mon"))
searches for all frequency
s except for day
and mon
.
If value
is given, the modified EsgfQuery
object.
Otherwise, an EsgfQueryParam
object which is essentially a list of three elements:
value
: input values.
negate
: Whether there is a preceding !
.
name
: Parameter name.
\dontrun{ # get current value q$frequency() # set the parameter q$frequency(c("1hr", "day")) # negate the constraints q$frequency(!c("1hr", "day")) # remove the parameter q$frequency(NULL) }
variant_label()
Get or set the variant_label
facet parameter.
EsgfQuery$variant_label(value)
value
The parameter value. Default: NULL
.
There are two options:
If value
is not given, current value is returned.
A character vector or NULL
. Note that you can put a preceding !
to negate the facet constraints. For example, $variant_label(!c("r1i1p1f1", "r2i1p1f1"))
searches for all variant_label
s except for r1i1p1f1
and r2i1p1f1
.
If value
is given, the modified EsgfQuery
object.
Otherwise, an EsgfQueryParam
object which is essentially a list of three elements:
value
: input values.
negate
: Whether there is a preceding !
.
name
: Parameter name.
\dontrun{ # get current value q$variant_label() # set the parameter q$variant_label(c("r1i1p1f1", "r1i2p1f1")) # negate the constraints q$variant_label(!c("r1i1p1f1", "r1i2p1f1")) # remove the parameter q$variant_label(NULL) }
nominal_resolution()
Get or set the nominal_resolution
facet parameter.
EsgfQuery$nominal_resolution(value)
value
The parameter value. Default: NULL
.
There are two options:
If value
is not given, current value is returned.
A character vector or NULL
. Note that you can put a preceding !
to negate the facet constraints. For example, $nominal_resolution(!c("50 km", "1x1 degree"))
searches for all nominal_resolution
s except for 50 km
and 1x1 degree
.
If value
is given, the modified EsgfQuery
object.
Otherwise, an EsgfQueryParam
object which is essentially a list of three elements:
value
: input values.
negate
: Whether there is a preceding !
.
name
: Parameter name.
\dontrun{ # get current value q$nominal_resolution() # set the parameter q$nominal_resolution(c("100 km", "1x1 degree")) # negate the constraints q$nominal_resolution(!c("100 km", "1x1 degree")) # remove the parameter q$nominal_resolution(NULL) }
data_node()
Get or set the data_node
parameter.
EsgfQuery$data_node(value)
value
The parameter value. Default: NULL
.
There are two options:
If value
is not given, current value is returned.
A character vector or NULL
. Note that you can put a preceding !
to negate the facet constraints. For example, $data_node(!c("cmip.bcc.cma.cn", "esg.camscma.cn"))
searches for all data_node
s except for cmip.bcc.cma.cn
and esg.camscma.cn
.
If value
is given, the modified EsgfQuery
object.
Otherwise, an EsgfQueryParam
object which is essentially a list of three elements:
value
: input values.
negate
: Whether there is a preceding !
.
name
: Parameter name.
\dontrun{ # get current value q$data_node() # set the parameter q$data_node("esg.lasg.ac.cn") # negate the constraints q$data_node(!"esg.lasg.ac.cn") # remove the parameter q$data_node(NULL) }
facets()
Get or set the facets
parameter for facet counting query.
Note that $facets()
only affects
$count()
method when sending a query of facet counting.
EsgfQuery$facets(value)
value
The facet parameter value. Default: NULL
.
There are two options:
If value
is not given, current value is returned.
A character vector or NULL
. The special notation "*"
can be used to indicate that all available facets should be considered.
If value
is given, the modified EsgfQuery
object.
Otherwise, an EsgfQueryParam
object which is essentially a list of three elements:
value
: input values.
negate
: Whether there is a preceding !
.
name
: Parameter name.
\dontrun{ # get current value q$facets() # set the facets q$facets(c("activity_id", "source_id")) # use all available facets q$facets("*") }
fields()
Get or set the fields
parameter.
By default, all available metadata fields are returned for each
query. $facets()
can be used to limit the number of fields returned
in the query response.
EsgfQuery$fields(value = "*")
value
The facet parameter value. Default: "*"
.
There are two options:
If value
is not given, current value is returned.
A character vector or NULL
. The special notation "*"
can be used to indicate that all available fields should be considered.
If value
is given, the modified EsgfQuery
object.
Otherwise, an EsgfQueryParam
object which is essentially a list of three elements:
value
: input values.
negate
: Whether there is a preceding !
.
name
: Parameter name.
\dontrun{ # get current value q$fields() # set the fields q$fields(c("activity_id", "source_id")) # use all available fields q$fields("*") # remove the parameter # act the same as above because the default `fields` in ESGF search # services is `*` if `fields` is not specified q$fields(NULL) }
shards()
Get or set the shards
parameter.
By default, a distributed query targets all ESGF Nodes. $shards()
can be used to execute a distributed search that targets only one or
more specific nodes.
All available shards can be retrieved using
$list_all_shards()
method.
EsgfQuery$shards(value)
value
The facet parameter value. There are two options:
If value
is not given, current value is returned.
A character vector or NULL
.
If value
is given, the modified EsgfQuery
object.
Otherwise, an EsgfQueryParam
object which is essentially a list of three elements:
value
: input values.
negate
: Whether there is a preceding !
.
name
: Parameter name.
\dontrun{ # get current value q$shards() # set the parameter q$shards("localhost:8983/solr/datasets") # negate the constraints q$shards(!"localhost:8983/solr/datasets") # only applicable for distributed queries q$distrib(FALSE)$shards("localhost:8983/solr/datasets") # Error # remove the parameter q$shards(NULL) }
replica()
Get or set the replica
parameter.
By default, a query returns all records (masters and replicas)
matching the search criteria, i.e. $replica(NULL)
.
To return only master records, use $replica(FALSE)
; to return only
replicas, use $replica(TRUE)
.
EsgfQuery$replica(value)
value
The facet parameter value. Default: NULL
.
There are two options:
If value
is not given, current value is returned.
A flag or NULL
.
If value
is given, the modified EsgfQuery
object.
Otherwise, an EsgfQueryParam
object which is essentially a list of three elements:
value
: input values.
negate
: Whether there is a preceding !
.
name
: Parameter name.
\dontrun{ # get current value q$replica() # set the parameter q$replica(TRUE) # remove the parameter q$replica(NULL) }
latest()
Get or set the latest
parameter.
By default, a query to the ESGF search services returns only the very
last, up-to-date version of the matching records, i.e.
$latest(TRUE)
. You can use $latest(FALSE)
to return all versions.
EsgfQuery$latest(value = TRUE)
value
The facet parameter value. Default: TRUE
.
There are two options:
If value
is not given, current value is returned.
A flag.
If value
is given, the modified EsgfQuery
object.
Otherwise, an EsgfQueryParam
object which is essentially a list of three elements:
value
: input values.
negate
: Whether there is a preceding !
.
name
: Parameter name.
\dontrun{ # get current value q$latest() # set the parameter q$latest(TRUE) }
type()
Get or set the type
parameter.
There are three types in total: Dataset
, File
or Aggregation
.
EsgfQuery$type(value = "Dataset")
value
The facet parameter value. Default: "Dataset"
.
There are two options:
If value
is not given, current value is returned.
A string.
If value
is given, the modified EsgfQuery
object.
Otherwise, an EsgfQueryParam
object which is essentially a list of three elements:
value
: input values.
negate
: Whether there is a preceding !
.
name
: Parameter name.
\dontrun{ # get current value q$type() # set the parameter q$type("Dataset") }
limit()
Get or set the limit
parameter.
$limit()
can be used to limit the number of records to return.
Note that the maximum number of records to return per query for ESGF
search services is 10,000. A warning is issued if input value is
greater than that. In this case, limit
will be reset to 10,000.
EsgfQuery$limit(value = 10L)
value
The facet parameter value. Default: 10
.
There are two options:
If value
is not given, current value is returned.
An integer.
If value
is given, the modified EsgfQuery
object.
Otherwise, an EsgfQueryParam
object which is essentially a list of three elements:
value
: input values.
negate
: Whether there is a preceding !
.
name
: Parameter name.
\dontrun{ # get current value q$limit() # set the parameter q$limit(10L) # `limit` is reset to 10,000 if input is greater than that q$limit(10000L) # warning }
offset()
Get or set the offset
parameter.
If the query returns records that exceed the
limit
number,
$offset()
can be used to paginate through the available results.
EsgfQuery$offset(value = 0L)
value
The facet parameter value. Default: 0
.
There are two options:
If value
is not given, current value is returned.
An integer.
If value
is given, the modified EsgfQuery
object.
Otherwise, an EsgfQueryParam
object which is essentially a list of three elements:
value
: input values.
negate
: Whether there is a preceding !
.
name
: Parameter name.
\dontrun{ # get current value q$offset() # set the parameter q$offset(0L) }
distrib()
Get or set the distrib
facet
By default, the query is sent to all ESGF Nodes, i.e.
$distrib(TRUE)
.
$distrib(FALSE)
can be used to execute the query only on the
target node.
EsgfQuery$distrib(value = TRUE)
value
The facet parameter value. Default: TRUE
.
There are two options:
If value
is not given, current value is returned.
A flag.
If value
is given, the modified EsgfQuery
object.
Otherwise, an EsgfQueryParam
object which is essentially a list of three elements:
value
: input values.
negate
: Whether there is a preceding !
.
name
: Parameter name.
\dontrun{ # get current value q$distrib() # set the parameter q$distrib(TRUE) }
params()
Get or set other parameters.
$params()
can be used to specify other parameters that do not have
a dedicated method, e.g. version
, master_id
, etc. It can also be
used to overwrite existing parameter values specified using methods
like $activity_id()
.
EsgfQuery$params(...)
...
Parameter values to set. There are three options:
If not given, existing parameters that do not have a dedicated method are returned.
If NULL
, all existing parameters that do not have a dedicated
method are removed.
A named vector, e.g. $params(score = 1, table_id = "day")
will
set score
to 1
and table_id
to day
.
The !
notation can still be used to negate the constraints, e.g.
$params(table_id = !c("3hr", "day"))
searches for all table_id
except for 3hr
and day
.
If parameters are specified, the modified EsgfQuery
object,
invisibly.
Otherwise, an empty list for $params(NULL)
or a list of
EsgfQueryParam
objects.
\dontrun{ # get current values # default is an empty list (`list()`) q$params() # set the parameter q$params(table_id = c("3hr", "day"), member_id = "00") q$params() # reset existing parameters q$frequency("day") q$params(frequency = "mon") q$frequency() # frequency value has been changed using $params() # negating the constraints is also supported q$params(table_id = !c("3hr", "day")) # use NULL to remove all parameters q$params(NULL)$params() }
url()
Get the URL of actual query or wget script
EsgfQuery$url(wget = FALSE)
wget
Whether to return the URL of the wget script that can be
used to download all files matching the given constraints.
Default: FALSE
.
A single string.
\dontrun{ q$url() # get the wget script URL q$url(wget = TRUE) # You can download the wget script using the URL directly. For # example, the code below downloads the script and save it as # 'wget.sh' in R's temporary folder: download.file(q$url(TRUE), file.path(tempdir(), "wget.sh"), mode = "wb") }
count()
Send a query of facet counting and fetch the results
EsgfQuery$count(facets = TRUE)
facets
NULL
, a flag or a character vector. There are three
options:
If NULL
or FALSE
, only the total number of matched records is
returned.
If TRUE
, the value of $facets()
is used to limit the facets. This is the default value.
If a character vector, it is used to limit the facets.
If facets
equals NULL
or FALSE
, or $facets()
returns NULL
,
an integer.
Otherwise, a named list with the first element always being total
which is the total number of matched records. Other elements have
the same length as input facets and are all named integer vectors.
\dontrun{ # get the total number of matched records q$count(NULL) # or q$count(facets = FALSE) # count records for specific facets q$facets(c("activity_id", "source_id"))$count() # same as above q$count(facets = c("activity_id", "source_id")) }
collect()
Send the actual query and fetch the results
$collect()
sends the actual query to the ESGF search services and
returns the results in a data.table::data.table. The columns depend
on the value of query type and fields
parameter.
EsgfQuery$collect()
A data.table.
\dontrun{ q$fields("source_id") q$collect() }
response()
Get the response of last sent query
The response of the last sent query is always stored internally and
can be retrieved using $response()
. It is a named list generated
from the JSON response using jsonlite::fromJSON()
.
EsgfQuery$response()
A named list.
\dontrun{ q$response() }
print()
Print a summary of the current EsgfQuery
object
$print()
gives the summary of current EsgfQuery
object including
the host URL, the built time of facet cache and all query parameters.
EsgfQuery$print()
The EsgfQuery
object itself, invisibly.
\dontrun{ q$print() }
Hongyuan Jia
## ------------------------------------------------ ## Method `EsgfQuery$new` ## ------------------------------------------------ ## Not run: q <- EsgfQuery$new(host = "https://esgf-node.llnl.gov/esg-search") q ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$build_cache` ## ------------------------------------------------ ## Not run: q$build_cache() ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$list_all_facets` ## ------------------------------------------------ ## Not run: q$list_all_facets() ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$list_all_shards` ## ------------------------------------------------ ## Not run: q$list_all_shards() ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$list_all_values` ## ------------------------------------------------ ## Not run: q$list_all_values() ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$project` ## ------------------------------------------------ ## Not run: # get current value q$project() # set the parameter q$project("CMIP6") # negate the project constraints q$project(!"CMIP6") # remove the parameter q$project(NULL) ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$activity_id` ## ------------------------------------------------ ## Not run: # get current value q$activity_id() # set the parameter q$activity_id("ScenarioMIP") # negate the constraints q$activity_id(!c("CFMIP", "ScenarioMIP")) # remove the parameter q$activity_id(NULL) ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$experiment_id` ## ------------------------------------------------ ## Not run: # get current value q$experiment_id() # set the parameter q$experiment_id(c("ssp126", "ssp585")) # negate the constraints q$experiment_id(!c("ssp126", "ssp585")) # remove the parameter q$experiment_id(NULL) ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$source_id` ## ------------------------------------------------ ## Not run: # get current value q$source_id() # set the parameter q$source_id(c("BCC-CSM2-MR", "CESM2")) # negate the constraints q$source_id(!c("BCC-CSM2-MR", "CESM2")) # remove the parameter q$source_id(NULL) ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$variable_id` ## ------------------------------------------------ ## Not run: # get current value q$variable_id() # set the parameter q$variable_id(c("tas", "pr")) # negate the constraints q$variable_id(!c("tas", "pr")) # remove the parameter q$variable_id(NULL) ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$frequency` ## ------------------------------------------------ ## Not run: # get current value q$frequency() # set the parameter q$frequency(c("1hr", "day")) # negate the constraints q$frequency(!c("1hr", "day")) # remove the parameter q$frequency(NULL) ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$variant_label` ## ------------------------------------------------ ## Not run: # get current value q$variant_label() # set the parameter q$variant_label(c("r1i1p1f1", "r1i2p1f1")) # negate the constraints q$variant_label(!c("r1i1p1f1", "r1i2p1f1")) # remove the parameter q$variant_label(NULL) ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$nominal_resolution` ## ------------------------------------------------ ## Not run: # get current value q$nominal_resolution() # set the parameter q$nominal_resolution(c("100 km", "1x1 degree")) # negate the constraints q$nominal_resolution(!c("100 km", "1x1 degree")) # remove the parameter q$nominal_resolution(NULL) ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$data_node` ## ------------------------------------------------ ## Not run: # get current value q$data_node() # set the parameter q$data_node("esg.lasg.ac.cn") # negate the constraints q$data_node(!"esg.lasg.ac.cn") # remove the parameter q$data_node(NULL) ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$facets` ## ------------------------------------------------ ## Not run: # get current value q$facets() # set the facets q$facets(c("activity_id", "source_id")) # use all available facets q$facets("*") ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$fields` ## ------------------------------------------------ ## Not run: # get current value q$fields() # set the fields q$fields(c("activity_id", "source_id")) # use all available fields q$fields("*") # remove the parameter # act the same as above because the default `fields` in ESGF search # services is `*` if `fields` is not specified q$fields(NULL) ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$shards` ## ------------------------------------------------ ## Not run: # get current value q$shards() # set the parameter q$shards("localhost:8983/solr/datasets") # negate the constraints q$shards(!"localhost:8983/solr/datasets") # only applicable for distributed queries q$distrib(FALSE)$shards("localhost:8983/solr/datasets") # Error # remove the parameter q$shards(NULL) ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$replica` ## ------------------------------------------------ ## Not run: # get current value q$replica() # set the parameter q$replica(TRUE) # remove the parameter q$replica(NULL) ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$latest` ## ------------------------------------------------ ## Not run: # get current value q$latest() # set the parameter q$latest(TRUE) ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$type` ## ------------------------------------------------ ## Not run: # get current value q$type() # set the parameter q$type("Dataset") ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$limit` ## ------------------------------------------------ ## Not run: # get current value q$limit() # set the parameter q$limit(10L) # `limit` is reset to 10,000 if input is greater than that q$limit(10000L) # warning ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$offset` ## ------------------------------------------------ ## Not run: # get current value q$offset() # set the parameter q$offset(0L) ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$distrib` ## ------------------------------------------------ ## Not run: # get current value q$distrib() # set the parameter q$distrib(TRUE) ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$params` ## ------------------------------------------------ ## Not run: # get current values # default is an empty list (`list()`) q$params() # set the parameter q$params(table_id = c("3hr", "day"), member_id = "00") q$params() # reset existing parameters q$frequency("day") q$params(frequency = "mon") q$frequency() # frequency value has been changed using $params() # negating the constraints is also supported q$params(table_id = !c("3hr", "day")) # use NULL to remove all parameters q$params(NULL)$params() ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$url` ## ------------------------------------------------ ## Not run: q$url() # get the wget script URL q$url(wget = TRUE) # You can download the wget script using the URL directly. For # example, the code below downloads the script and save it as # 'wget.sh' in R's temporary folder: download.file(q$url(TRUE), file.path(tempdir(), "wget.sh"), mode = "wb") ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$count` ## ------------------------------------------------ ## Not run: # get the total number of matched records q$count(NULL) # or q$count(facets = FALSE) # count records for specific facets q$facets(c("activity_id", "source_id"))$count() # same as above q$count(facets = c("activity_id", "source_id")) ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$collect` ## ------------------------------------------------ ## Not run: q$fields("source_id") q$collect() ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$response` ## ------------------------------------------------ ## Not run: q$response() ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$print` ## ------------------------------------------------ ## Not run: q$print() ## End(Not run)
## ------------------------------------------------ ## Method `EsgfQuery$new` ## ------------------------------------------------ ## Not run: q <- EsgfQuery$new(host = "https://esgf-node.llnl.gov/esg-search") q ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$build_cache` ## ------------------------------------------------ ## Not run: q$build_cache() ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$list_all_facets` ## ------------------------------------------------ ## Not run: q$list_all_facets() ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$list_all_shards` ## ------------------------------------------------ ## Not run: q$list_all_shards() ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$list_all_values` ## ------------------------------------------------ ## Not run: q$list_all_values() ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$project` ## ------------------------------------------------ ## Not run: # get current value q$project() # set the parameter q$project("CMIP6") # negate the project constraints q$project(!"CMIP6") # remove the parameter q$project(NULL) ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$activity_id` ## ------------------------------------------------ ## Not run: # get current value q$activity_id() # set the parameter q$activity_id("ScenarioMIP") # negate the constraints q$activity_id(!c("CFMIP", "ScenarioMIP")) # remove the parameter q$activity_id(NULL) ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$experiment_id` ## ------------------------------------------------ ## Not run: # get current value q$experiment_id() # set the parameter q$experiment_id(c("ssp126", "ssp585")) # negate the constraints q$experiment_id(!c("ssp126", "ssp585")) # remove the parameter q$experiment_id(NULL) ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$source_id` ## ------------------------------------------------ ## Not run: # get current value q$source_id() # set the parameter q$source_id(c("BCC-CSM2-MR", "CESM2")) # negate the constraints q$source_id(!c("BCC-CSM2-MR", "CESM2")) # remove the parameter q$source_id(NULL) ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$variable_id` ## ------------------------------------------------ ## Not run: # get current value q$variable_id() # set the parameter q$variable_id(c("tas", "pr")) # negate the constraints q$variable_id(!c("tas", "pr")) # remove the parameter q$variable_id(NULL) ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$frequency` ## ------------------------------------------------ ## Not run: # get current value q$frequency() # set the parameter q$frequency(c("1hr", "day")) # negate the constraints q$frequency(!c("1hr", "day")) # remove the parameter q$frequency(NULL) ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$variant_label` ## ------------------------------------------------ ## Not run: # get current value q$variant_label() # set the parameter q$variant_label(c("r1i1p1f1", "r1i2p1f1")) # negate the constraints q$variant_label(!c("r1i1p1f1", "r1i2p1f1")) # remove the parameter q$variant_label(NULL) ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$nominal_resolution` ## ------------------------------------------------ ## Not run: # get current value q$nominal_resolution() # set the parameter q$nominal_resolution(c("100 km", "1x1 degree")) # negate the constraints q$nominal_resolution(!c("100 km", "1x1 degree")) # remove the parameter q$nominal_resolution(NULL) ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$data_node` ## ------------------------------------------------ ## Not run: # get current value q$data_node() # set the parameter q$data_node("esg.lasg.ac.cn") # negate the constraints q$data_node(!"esg.lasg.ac.cn") # remove the parameter q$data_node(NULL) ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$facets` ## ------------------------------------------------ ## Not run: # get current value q$facets() # set the facets q$facets(c("activity_id", "source_id")) # use all available facets q$facets("*") ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$fields` ## ------------------------------------------------ ## Not run: # get current value q$fields() # set the fields q$fields(c("activity_id", "source_id")) # use all available fields q$fields("*") # remove the parameter # act the same as above because the default `fields` in ESGF search # services is `*` if `fields` is not specified q$fields(NULL) ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$shards` ## ------------------------------------------------ ## Not run: # get current value q$shards() # set the parameter q$shards("localhost:8983/solr/datasets") # negate the constraints q$shards(!"localhost:8983/solr/datasets") # only applicable for distributed queries q$distrib(FALSE)$shards("localhost:8983/solr/datasets") # Error # remove the parameter q$shards(NULL) ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$replica` ## ------------------------------------------------ ## Not run: # get current value q$replica() # set the parameter q$replica(TRUE) # remove the parameter q$replica(NULL) ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$latest` ## ------------------------------------------------ ## Not run: # get current value q$latest() # set the parameter q$latest(TRUE) ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$type` ## ------------------------------------------------ ## Not run: # get current value q$type() # set the parameter q$type("Dataset") ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$limit` ## ------------------------------------------------ ## Not run: # get current value q$limit() # set the parameter q$limit(10L) # `limit` is reset to 10,000 if input is greater than that q$limit(10000L) # warning ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$offset` ## ------------------------------------------------ ## Not run: # get current value q$offset() # set the parameter q$offset(0L) ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$distrib` ## ------------------------------------------------ ## Not run: # get current value q$distrib() # set the parameter q$distrib(TRUE) ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$params` ## ------------------------------------------------ ## Not run: # get current values # default is an empty list (`list()`) q$params() # set the parameter q$params(table_id = c("3hr", "day"), member_id = "00") q$params() # reset existing parameters q$frequency("day") q$params(frequency = "mon") q$frequency() # frequency value has been changed using $params() # negating the constraints is also supported q$params(table_id = !c("3hr", "day")) # use NULL to remove all parameters q$params(NULL)$params() ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$url` ## ------------------------------------------------ ## Not run: q$url() # get the wget script URL q$url(wget = TRUE) # You can download the wget script using the URL directly. For # example, the code below downloads the script and save it as # 'wget.sh' in R's temporary folder: download.file(q$url(TRUE), file.path(tempdir(), "wget.sh"), mode = "wb") ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$count` ## ------------------------------------------------ ## Not run: # get the total number of matched records q$count(NULL) # or q$count(facets = FALSE) # count records for specific facets q$facets(c("activity_id", "source_id"))$count() # same as above q$count(facets = c("activity_id", "source_id")) ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$collect` ## ------------------------------------------------ ## Not run: q$fields("source_id") q$collect() ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$response` ## ------------------------------------------------ ## Not run: q$response() ## End(Not run) ## ------------------------------------------------ ## Method `EsgfQuery$print` ## ------------------------------------------------ ## Not run: q$print() ## End(Not run)
extract_data()
takes an epw_cmip6_coord
object generated using
match_coord()
and extracts CMIP6 data using the coordinates and years of
interest specified.
extract_data( coord, years = NULL, unit = FALSE, out_dir = NULL, by = NULL, keep = is.null(out_dir), compress = 100 )
extract_data( coord, years = NULL, unit = FALSE, out_dir = NULL, by = NULL, keep = is.null(out_dir), compress = 100 )
coord |
An |
years |
An integer vector indicating the target years to be included in
the data file. All other years will be excluded. If |
unit |
If |
out_dir |
The directory to save extracted data using |
by |
A character vector of variable names used to split data during extraction. Should be a subset of:
If |
keep |
Whether keep extracted data in memory. Default: |
compress |
A single integer in the range 0 to 100, indicating the amount
of compression to use. Lower values mean larger file sizes. Default:
|
extract_data()
supports common calendars, including 365_day
and
360_day
, thanks to the PCICt package.
extract_data()
uses future.apply
underneath. You can use your preferable future backend to
speed up data extraction in parallel. By default, extract_data()
uses
future::sequential
backend, which runs things in sequential.
An epw_cmip6_data
object, which is basically a list of 3 elements:
epw
: An eplusr::Epw object whose longitude and latitude are used to
extract CMIP6 data. It is the same object as created in match_coord()
meta
: A list containing basic metadata of input EPW, including city
,
state_province
, country
, latitude
and longitude
.
data
: An empty data.table::data.table()
if keep
is FALSE
or a
data.table::data.table()
of 14 columns if keep
is TRUE
:
No. | Column | Type | Description |
1 | activity_drs |
Character | Activity DRS (Data Reference Syntax) |
2 | institution_id |
Character | Institution identifier |
3 | source_id |
Character | Model identifier |
4 | experiment_id |
Character | Root experiment identifier |
5 | member_id |
Character | A compound construction from sub_experiment_id and variant_label |
6 | table_id |
Character | Table identifier |
7 | lon |
Double | Longitude of extracted location |
8 | lat |
Double | Latitude of extracted location |
9 | dist |
Double | The spherical distance in km between EPW location and grid coordinates |
10 | datetime |
POSIXct | Datetime for the predicted value |
11 | variable |
Character | Variable identifier |
12 | description |
Character | Variable long name |
13 | units |
Character | Units of variable |
14 | value |
Double | The actual predicted value |
## Not run: coord <- match_coord("path_to_an_EPW") extract_data(coord, years = 2030:2060) ## End(Not run)
## Not run: coord <- match_coord("path_to_an_EPW") extract_data(coord, years = 2030:2060) ## End(Not run)
Create future EPW files using morphed data
future_epw( morphed, by = c("experiment", "source", "interval"), dir = ".", separate = TRUE, overwrite = FALSE, full = FALSE )
future_epw( morphed, by = c("experiment", "source", "interval"), dir = ".", separate = TRUE, overwrite = FALSE, full = FALSE )
morphed |
An |
by |
A character vector of columns to be used as grouping variables when creating EPW files. Should be a subset of:
|
dir |
The parent directory to save the generated EPW files. If not
exist, it will be created first. Default: |
separate |
If |
overwrite |
If |
full |
If |
If full
is FALSE
, which is the default, a list of generated eplusr::Epw objects, invisibly.
Otherwise, a data.table with columns:
specified by the by
value
epw
: a list of eplusr::Epw
path
: full paths of the generated EPW files
If option epwshiftr.dir
is set, use it. Otherwise, get package data storage
directory using rappdirs::user_data_dir()
.
get_data_dir()
get_data_dir()
A single string.
options(epwshiftr.dir = tempdir()) get_data_dir()
options(epwshiftr.dir = tempdir()) get_data_dir()
Get data nodes which store CMIP6 output
get_data_node(speed_test = FALSE, timeout = 3)
get_data_node(speed_test = FALSE, timeout = 3)
speed_test |
If |
timeout |
Timeout for a ping response in seconds. Default: |
A data.table::data.table()
of 2 or 3 (when speed_test
is TRUE
)
columns:
Column | Type | Description |
data_node |
character | Web address of data node |
status |
character | Status of data node. "UP" means OK and "DOWN" means currently not available |
ping |
double | Data node response in milliseconds during speed test |
## Not run: get_data_node() ## End(Not run)
## Not run: get_data_node() ## End(Not run)
init_cmip6_index()
will search the CMIP6 model output file using esgf_query()
, return a data.table::data.table()
containing the actual NetCDF file url
to download, and store it into user data directory for future use.
init_cmip6_index( activity = "ScenarioMIP", variable = c("tas", "tasmax", "tasmin", "hurs", "hursmax", "hursmin", "pr", "rsds", "rlds", "psl", "sfcWind", "clt"), frequency = "day", experiment = c("ssp126", "ssp245", "ssp370", "ssp585"), source = c("AWI-CM-1-1-MR", "BCC-CSM2-MR", "CESM2", "CESM2-WACCM", "EC-Earth3", "EC-Earth3-Veg", "GFDL-ESM4", "INM-CM4-8", "INM-CM5-0", "MPI-ESM1-2-HR", "MRI-ESM2-0"), variant = "r1i1p1f1", replica = FALSE, latest = TRUE, resolution = c("100 km", "50 km"), limit = 10000L, data_node = NULL, years = NULL, save = FALSE )
init_cmip6_index( activity = "ScenarioMIP", variable = c("tas", "tasmax", "tasmin", "hurs", "hursmax", "hursmin", "pr", "rsds", "rlds", "psl", "sfcWind", "clt"), frequency = "day", experiment = c("ssp126", "ssp245", "ssp370", "ssp585"), source = c("AWI-CM-1-1-MR", "BCC-CSM2-MR", "CESM2", "CESM2-WACCM", "EC-Earth3", "EC-Earth3-Veg", "GFDL-ESM4", "INM-CM4-8", "INM-CM5-0", "MPI-ESM1-2-HR", "MRI-ESM2-0"), variant = "r1i1p1f1", replica = FALSE, latest = TRUE, resolution = c("100 km", "50 km"), limit = 10000L, data_node = NULL, years = NULL, save = FALSE )
activity |
A character vector indicating activity identifiers. Default:
|
variable |
A character vector indicating variable identifiers. The 12
most related variables for EPW are set as defaults. If
|
frequency |
A character vector of sampling frequency. If
|
experiment |
A character vector indicating root experiment identifiers.
The Tier-1 experiment of activity ScenarioMIP are set as defaults.
If |
source |
A character vector indicating model identifiers. Defaults are
set to 11 sources which give outputs of all 4 experiment of activity
ScenarioMIP with daily frequency, i.e. |
variant |
A character vector indicating label constructed from 4
indices stored as global attributes in format
|
replica |
Whether the record is the "master" copy, or a replica. Use
|
latest |
Whether the record is the latest available version, or a
previous version. Use |
resolution |
A character vector indicating approximate horizontal
resolution. Default: |
limit |
An integer indicating the maximum of matched records to return.
Should be <= 10,000. Default: |
data_node |
A character vector indicating data nodes to be queried.
Default to |
years |
An integer vector indicating the target years to be include in
the data file. All other years will be excluded. If |
save |
If |
For details on where the file index is stored, see rappdirs::user_data_dir()
.
A data.table::data.table with 22 columns:
No. | Column | Type | Description |
1 | file_id |
Character | Model output file universal identifier |
2 | dataset_id |
Character | Dataset universal identifier |
3 | mip_era |
Character | Activity's associated CMIP cycle. Will always be "CMIP6" |
4 | activity_drs |
Character | Activity DRS (Data Reference Syntax) |
5 | institution_id |
Character | Institution identifier |
6 | source_id |
Character | Model identifier |
7 | experiment_id |
Character | Root experiment identifier |
8 | member_id |
Character | A compound construction from sub_experiment_id and variant_label |
9 | table_id |
Character | Table identifier |
10 | frequency |
Character | Sampling frequency |
11 | grid_label |
Character | Grid identifier |
12 | version |
Character | Approximate date of model output file |
13 | nominal_resolution |
Character | Approximate horizontal resolution |
14 | variable_id |
Character | Variable identifier |
15 | variable_long_name |
Character | Variable long name |
16 | variable_units |
Character | Units of variable |
17 | datetime_start |
POSIXct | Start date and time of simulation |
18 | datetime_end |
POSIXct | End date and time of simulation |
19 | file_size |
Character | Model output file size in Bytes |
20 | data_node |
Character | Data node to download the model output file |
21 | dataset_pid |
Character | A unique string that helps identify the dataset |
22 | tracking_id |
Character | A unique string that helps identify the output file |
Argument limit
will only apply to Dataset
query. init_cmip6_index()
will
try to get all model output files which match the dataset id.
## Not run: init_cmip6_index() ## End(Not run)
## Not run: init_cmip6_index() ## End(Not run)
Load previously stored CMIP6 experiment output file index
load_cmip6_index(force = FALSE)
load_cmip6_index(force = FALSE)
force |
If |
A data.table::data.table with 20 columns. For detail description on
column, see init_cmip6_index()
.
## Not run: load_cmip6_index() ## End(Not run)
## Not run: load_cmip6_index() ## End(Not run)
match_coord()
takes an EPW and uses its longitude and latitude to calculate
the distance between the EPW location and the global grid points in NetCDF
files.
match_coord(epw, threshold = list(lon = 1, lat = 1), max_num = NULL)
match_coord(epw, threshold = list(lon = 1, lat = 1), max_num = NULL)
epw |
Possible values:
|
threshold |
A list of 2 elements |
max_num |
The maximum number of grid points to be matched. Default is
|
match_coord()
uses future.apply
underneath. You can use your preferable future backend to
speed up data extraction in parallel. By default, match_coord()
uses
future::sequential
backend, which runs things in sequential.
An epw_cmip6_coord
object, which is basically a list of 3 elements:
epw
: An eplusr::Epw object parsed from input epw
argument
meta
: A list containing basic meta data of input EPW, including city
,
state_province
, country
, latitute
and longitude
.
coord
: A data.table::data.table()
which is basically CMIP6 index
database with an appending new list column coord
that contains
matched latitudes and longitudes in each NetCDF file. Each element
in coord
is a data.table::data.table()
of 6 columns describing
the matched coordinates.
index
: the indices of matched coordinates
ind_lon
, ind_lat
: The value indices of longitude or latitude in the
NetCDF coordinate grids. These values are used to extract the
corresponding variable values
lon
, lat
: the actual longitude or latitude in the NetCDF coordinate
grids
dist
: the distance in km between the coordinate values in NetCDF and
input EPW
match_coord()
calculates the geographical distances based formulas of
spherical trigonometry:
where is the latitude and
is the longitude. This
formula treats the Earth as a sphere. The geographical distance between
points on the surface of a spherical Earth is
.
For more details, please see this Wikipedia
## Not run: # download an EPW from EnergyPlus website epw <- eplusr::download_weather("los angeles.*TMY3", dir = tempdir(), type = "EPW", ask = FALSE) match_coord(epw, threshold = list(lon = 1.0, lat = 1.0)) ## End(Not run)
## Not run: # download an EPW from EnergyPlus website epw <- eplusr::download_weather("los angeles.*TMY3", dir = tempdir(), type = "EPW", ask = FALSE) match_coord(epw, threshold = list(lon = 1.0, lat = 1.0)) ## End(Not run)
morphing_epw()
takes an epw_cmip6_data
object generated using
extract_data()
and calculates future core EPW weather variables using
Morphing Method.
morphing_epw( data, years = NULL, labels = NULL, methods = NULL, warning = FALSE )
morphing_epw( data, years = NULL, labels = NULL, methods = NULL, warning = FALSE )
data |
An |
years |
An integer vector indicating the target years to be considered.
If |
labels |
A character or factor vector used for grouping input |
methods |
A named character giving the methods of morphing procedures of
each variables. Possible variable names are |
warning |
If |
The EPW weather variables that get morphed are listed in details.
An epw_cmip6_morphed
object, which is basically a list of 12 elements:
No. | Element | Type | Morphing Method | Description |
1 | epw |
eplusr::Epw | N/A | The original EPW file used for morphing |
2 | tdb |
data.table::data.table() |
Stretch | Data of dry-bulb temperature after morphing |
3 | tdew |
data.table::data.table() |
Derived | Data of dew-point temperature after morphing |
4 | rh |
data.table::data.table() |
Stretch | Data of relative humidity after morphing |
5 | p |
data.table::data.table() |
Stretch | Data of atmospheric pressure after morphing |
6 | hor_ir |
data.table::data.table() |
Stretch | Data of horizontal infrared radiation from the sky after morphing |
7 | glob_rad |
data.table::data.table() |
Stretch | Data of global horizontal radiation after morphing |
8 | norm_rad |
data.table::data.table() |
Derived | Data of direct normal radiation after morphing |
9 | diff_rad |
data.table::data.table() |
Stretch | Data of diffuse horizontal radiation after morphing |
10 | wind |
data.table::data.table() |
Stretch | Data of wind speed after morphing |
11 | total_cover |
data.table::data.table() |
Derived | Data of total sky cover after morphing |
12 | opaque_cover |
data.table::data.table() |
Derived | Data of opaque sky cover after morphing |
Each data.table::data.table()
listed above contains 19 columns below or an
empty data.table::data.table()
if the corresponding variables cannot be
found in the input epw_cmip6_data
object.
No. | Column | Type | Description |
1 | activity_drs |
Character | Activity DRS (Data Reference Syntax) |
2 | institution_id |
Character | Institution identifier |
3 | source_id |
Character | Model identifier |
4 | experiment_id |
Character | Root experiment identifier |
5 | member_id |
Character | A compound construction from sub_experiment_id and variant_label |
6 | table_id |
Character | Table identifier |
7 | lon |
Double | The averaged values of input longitude |
8 | lat |
Double | The averaged values of input latitude |
9 | dist |
Double | The averaged spherical distances in km between EPW location and grid coordinates |
10 | interval |
Factor | The label value used to average raw input data |
11 | datetime |
POSIXct | The datetime value with fake year generated by calling the Epw$data() method with the input EPW |
12 | year |
Integer | The original year of the raw EPW data |
13 | month |
Integer | The month value of the morphed data |
14 | day |
Integer | The day of the morphed data |
15 | hour |
Integer | The hour of the morphed data |
16 | minute |
Integer | The minute of the morphed data |
17 | Variable Name | Double | The morphed data, where Variable Name is the corresponding EPW weather variable name |
18 | delta |
Double | The shift factor. Will be NA for derived values |
19 | alpha |
Double | The stretch factor. Will be NA for derived values |
Here Morphing is an algorithm proposed by Belcher etc. 2005 used to morph the present-day observed weather files (here the EPWs) to produce future climate weather files. The EPW data is used as the 'baseline climate'.
The first step before morphing is to calculate the monthly means of
climatological variables in the EPW file, denoted by .
The subscript '0' is to denote the present day weather record, and 'm' is to
denote the month.
The morphing involves three generic operations, i.e. 1) a shift; 2) a linear stretch (scaling factor); and 3) a shift and a stretch:
If using a shift, for each month, a shift is applied to
.
is the absolute change in the monthly mean value
of the variable for the month
,
i.e.
. Here the monthly variance of the
variable is unchanged.
If using a stretch, for each month, a stretch is applied to
, where
is the fractional change in the monthly-mean
value of a variable, i.e.
. In this case,
the variance will be multiplied by to
When using a combined shift and stretch factor, both the mean and the variance will be switched off altogether.
For more details about morphing, please see (Belcher etc. 2005)
Belcher, S., Hacker, J., Powell, D., 2005. Constructing design weather data for future climates. Building Services Engineering Research and Technology 26, 49–61. https://doi.org/10.1191/0143624405bt112oa
set_cmip6_index()
takes a data.table::data.table()
as input and set it as
current index.
set_cmip6_index(index, save = FALSE)
set_cmip6_index(index, save = FALSE)
index |
A |
save |
If |
set_cmip6_index()
is useful when init_cmip6_index()
may give you too much
cases of which only some are of interest.
summary_database()
scans the directory specified and returns a
data.table()
containing summary information about all the CMIP6
files available against the output file index loaded using
load_cmip6_index()
.
summary_database( dir, by = c("activity", "experiment", "variant", "frequency", "variable", "source", "resolution"), mult = c("skip", "latest"), append = FALSE, miss = c("keep", "overwrite"), recursive = FALSE, update = FALSE, warning = TRUE )
summary_database( dir, by = c("activity", "experiment", "variant", "frequency", "variable", "source", "resolution"), mult = c("skip", "latest"), append = FALSE, miss = c("keep", "overwrite"), recursive = FALSE, update = FALSE, warning = TRUE )
dir |
A single string indicating the directory where CMIP6 model output NetCDF files are stored. |
by |
The grouping column to summary the database status. Should be a subset of:
|
mult |
Actions when multiple files match a same case in the CMIP6
index. If |
append |
If |
miss |
Actions when matched files in the previous summary do not exist
when running current summary. Only applicable when |
recursive |
If |
update |
If |
warning |
If |
The database here can be any directory that stores the NetCDF files for CMIP6
GCMs. It can be also be the same as get_data_dir()
where epwshiftr stores
the output file index, if you want to save the output file index and output
files in the same place.
summary_database()
uses the tracking_id
, datetime_start
and
datetime_end
global attributes of each NetCDF file to match against the
output file index. So the names of NetCDF files do not necessarily follow the
CMIP6 file name encoding.
summary_database()
will append 5 columns in the CMIP6 output file index:
file_path
: the full path of matched NetCDF file for every case.
summary_database()
uses future.apply
underneath to speed up the data processing if applicable. You can use your
preferable future backend to speed up data extraction in parallel. By default,
summary_database()
uses future::sequential
backend, which runs things in
sequential.
A data.table::data.table()
containing corresponding grouping
columns plus:
Column | Type | Description |
datetime_start |
POSIXct | Start date and time of simulation |
datetime_end |
POSIXct | End date and time of simulation |
file_num |
Integer | Total number of file per group |
file_size |
Units (Mbytes) | Approximate total size of file |
dl_num |
Integer | Total number of file downloaded |
dl_percent |
Units (%) | Total percentage of file downloaded |
dl_size |
Units (Mbytes) | Total size of file downloaded |
Also 2 extra data.table::data.table()
are attached as attributes:
not_found
: A data.table::data.table()
that contains metadata for those
CMIP6 outputs that are listed in current CMIP6 output file index but the
existing file paths are not valid now and cannot be found in current
database.
not_matched
: A data.table::data.table()
that contains metadata for
those CMIP6 output files that are found in current database but not listed
in current CMIP6 output file index.
For the meaning of grouping columns, see init_cmip6_index()
.
## Not run: summary_database() summary_database(by = "experiment") ## End(Not run)
## Not run: summary_database() summary_database(by = "experiment") ## End(Not run)