| MulticoreParam-class {BiocParallel} | R Documentation |
Enable multi-core parallel evaluation
Description
This class is used to parameterize single computer multicore parallel
evaluation on non-Windows computers. multicoreWorkers() chooses
the number of workers.
Usage
## constructor
## ------------------------------------
MulticoreParam(workers = multicoreWorkers(), tasks = 0L,
stop.on.error = TRUE,
progressbar = FALSE, RNGseed = NULL,
timeout = WORKER_TIMEOUT, exportglobals=TRUE,
log = FALSE, threshold = "INFO", logdir = NA_character_,
resultdir = NA_character_, jobname = "BPJOB",
force.GC = FALSE, fallback = TRUE,
manager.hostname = NA_character_, manager.port = NA_integer_,
...)
## detect workers
## ------------------------------------
multicoreWorkers()
Arguments
workers |
|
tasks |
In this documentation a job is defined as a single call to a function, such
as When A When the length of When the length of |
stop.on.error |
|
progressbar |
|
RNGseed |
|
timeout |
|
exportglobals |
|
log |
|
threshold |
|
logdir |
|
resultdir |
|
jobname |
|
force.GC |
|
fallback |
|
manager.hostname |
|
manager.port |
|
... |
Additional arguments passed to |
Details
MulticoreParam is used for shared memory computing. Under the hood
the cluster is created with makeCluster(..., type ="FORK") from
the parallel package.
See ?BIOCPARALLEL_WORKER_NUMBER to control the default and
maximum number of workers.
A FORK transport starts workers with the mcfork function and
communicates between master and workers using socket connections.
mcfork builds on fork() and thus a Linux cluster is not supported.
Because FORK clusters are Posix based they are not supported on
Windows. When MulticoreParam is created/used in Windows it
defaults to SerialParam which is the equivalent of using a
single worker.
- error handling:
-
By default all computations are attempted and partial results are returned with any error messages.
-
stop.on.errorAlogical. Stops all jobs as soon as one job fails or wait for all jobs to terminate. WhenFALSE, the return value is a list of successful results along with error messages as 'conditions'. The
bpok(x)function returns alogical()vector that is FALSE for any jobs that threw an error. The inputxis a list output from a bp*apply function such asbplapplyorbpmapply.
-
- logging:
-
When
log = TRUEthefutile.loggerpackage is loaded on the workers. All log messages written in thefutile.loggerformat are captured by the logging mechanism and returned in real-time (i.e., as each task completes) instead of after all jobs have finished.Messages sent to stdout and stderr are returned to the workspace by default. When
log = TRUEthese are diverted to the log output. Those familiar with theoutfileargument tomakeClustercan think oflog = FALSEas equivalent tooutfile = NULL; providing alogdiris the same as providing a name foroutfileexcept that BiocParallel writes a log file for each task.The log output includes additional statistics such as memory use and task runtime. Memory use is computed by calling gc(reset=TRUE) before code evaluation and gc() (no reseet) after. The output of the second gc() call is sent to the log file.
- log and result files:
-
Results and logs can be written to a file instead of returned to the workspace. Writing to files is done from the master as each task completes. Options can be set with the
logdirandresultdirfields in the constructor or with the accessors,bplogdirandbpresultdir. - random number generation:
-
For
MulticoreParam,SnowParam, andSerialParam, random number generation is controlled through theRNGseed =argument. BiocParallel uses the L'Ecuyer-CMRG random number generator described in the parallel package to generate independent random number streams. One stream is associated with each element ofX, and used to seed the random number stream for the application ofFUN()toX[[i]]. Thus settingRNGseed =ensures reproducibility acrossMulticoreParam(),SnowParam(), andSerialParam(), regardless of worker or task number. The default valueRNGseed = NULLmeans that each evaluation ofbplapplyproceeds independently.For details of the L'Ecuyer generator, see ?
clusterSetRNGStream.
Constructor
-
MulticoreParam(workers = multicoreWorkers(), tasks = 0L, stop.on.error = FALSE, tasks = 0L, progressbar = FALSE, RNGseed = NULL, timeout = Inf, exportglobals=TRUE, log = FALSE, threshold = "INFO", logdir = NA_character_, resultdir = NA_character_, manager.hostname = NA_character_, manager.port = NA_integer_, ...): -
Return an object representing a FORK cluster. The cluster is not created until
bpstartis called. Named arguments in...are passed tomakeCluster.
Accessors: Logging and results
In the following code, x is a MulticoreParam object.
bpprogressbar(x),bpprogressbar(x) <- value:-
Get or set the value to enable text progress bar.
valuemust be alogical(1). bpjobname(x),bpjobname(x) <- value:-
Get or set the job name.
bpRNGseed(x),bpRNGseed(x) <- value:-
Get or set the seed for random number generaton.
valuemust be anumeric(1)orNULL. bplog(x),bplog(x) <- value:-
Get or set the value to enable logging.
valuemust be alogical(1). bpthreshold(x),bpthreshold(x) <- value:-
Get or set the logging threshold.
valuemust be acharacter(1)string of one of the levels defined in thefutile.loggerpackage: “TRACE”, “DEBUG”, “INFO”, “WARN”, “ERROR”, or “FATAL”. bplogdir(x),bplogdir(x) <- value:-
Get or set the directory for the log file.
valuemust be acharacter(1)path, not a file name. The file is written out as LOGFILE.out. If nologdiris provided andbplog=TRUElog messages are sent to stdout. bpresultdir(x),bpresultdir(x) <- value:-
Get or set the directory for the result files.
valuemust be acharacter(1)path, not a file name. Separate files are written for each job with the prefix JOB (e.g., JOB1, JOB2, etc.). When noresultdiris provided the results are returned to the session aslist.
Accessors: Back-end control
In the code below x is a MulticoreParam object. See the
?BiocParallelParam man page for details on these accessors.
-
bpworkers(x) -
bpnworkers(x) -
bptasks(x),bptasks(x) <- value -
bpstart(x) -
bpstop(x) -
bpisup(x) -
bpbackend(x),bpbackend(x) <- value
Accessors: Error Handling
In the code below x is a MulticoreParam object. See the
?BiocParallelParam man page for details on these accessors.
-
bpstopOnError(x),bpstopOnError(x) <- value
Methods: Evaluation
In the code below BPPARAM is a MulticoreParam object.
Full documentation for these functions are on separate man pages: see
?bpmapply, ?bplapply, ?bpvec, ?bpiterate and
?bpaggregate.
-
bpmapply(FUN, ..., MoreArgs=NULL, SIMPLIFY=TRUE, USE.NAMES=TRUE, BPPARAM=bpparam()) -
bplapply(X, FUN, ..., BPPARAM=bpparam()) -
bpvec(X, FUN, ..., AGGREGATE=c, BPPARAM=bpparam()) -
bpiterate(ITER, FUN, ..., BPPARAM=bpparam()) -
bpaggregate(x, data, FUN, ..., BPPARAM=bpparam())
Methods: Other
In the code below x is a MulticoreParam object.
show(x):-
Displays the
MulticoreParamobject.
Global Options
See the 'Global Options' section of SnowParam for
manager host name and port defaults.
Author(s)
Martin Morgan mailto:mtmorgan@fhcrc.org and Valerie Obenchain
See Also
-
registerfor registering parameter classes for use in parallel evaluation. -
SnowParamfor computing in distributed memory -
DoparParamfor computing with foreach -
SerialParamfor non-parallel evaluation
Examples
## -----------------------------------------------------------------------
## Job configuration:
## -----------------------------------------------------------------------
## MulticoreParam supports shared memory computing. The object fields
## control the division of tasks, error handling, logging and
## result format.
bpparam <- MulticoreParam()
bpparam
## By default the param is created with the maximum available workers
## determined by multicoreWorkers().
multicoreWorkers()
## Fields are modified with accessors of the same name:
bplog(bpparam) <- TRUE
dir.create(resultdir <- tempfile())
bpresultdir(bpparam) <- resultdir
bpparam
## -----------------------------------------------------------------------
## Logging:
## -----------------------------------------------------------------------
## When 'log == TRUE' the workers use a custom script (in BiocParallel)
## that enables logging and access to other job statistics. Log messages
## are returned as each job completes rather than waiting for all to finish.
## In 'fun', a value of 'x = 1' will throw a warning, 'x = 2' is ok
## and 'x = 3' throws an error. Because 'x = 1' sleeps, the warning
## should return after the error.
X <- 1:3
fun <- function(x) {
if (x == 1) {
Sys.sleep(2)
sqrt(-x) ## warning
x
} else if (x == 2) {
x ## ok
} else if (x == 3) {
sqrt("FOO") ## error
}
}
## By default logging is off. Turn it on with the bplog()<- setter
## or by specifying 'log = TRUE' in the constructor.
bpparam <- MulticoreParam(3, log = TRUE, stop.on.error = FALSE)
res <- tryCatch({
bplapply(X, fun, BPPARAM=bpparam)
}, error=identity)
res
## When a 'logdir' location is given the messages are redirected to a file:
## Not run:
bplogdir(bpparam) <- tempdir()
bplapply(X, fun, BPPARAM = bpparam)
list.files(bplogdir(bpparam))
## End(Not run)
## -----------------------------------------------------------------------
## Managing results:
## -----------------------------------------------------------------------
## By default results are returned as a list. When 'resultdir' is given
## files are saved in the directory specified by job, e.g., 'TASK1.Rda',
## 'TASK2.Rda', etc.
## Not run:
dir.create(resultdir <- tempfile())
bpparam <- MulticoreParam(2, resultdir = resultdir, stop.on.error = FALSE)
bplapply(X, fun, BPPARAM = bpparam)
list.files(bpresultdir(bpparam))
## End(Not run)
## -----------------------------------------------------------------------
## Error handling:
## -----------------------------------------------------------------------
## When 'stop.on.error' is TRUE the job is terminated as soon as an
## error is hit. When FALSE, all computations are attempted and partial
## results are returned along with errors. In this example the number of
## 'tasks' is set to equal the length of 'X' so each element is run
## separately. (Default behavior is to divide 'X' evenly over workers.)
## All results along with error:
bpparam <- MulticoreParam(2, tasks = 4, stop.on.error = FALSE)
res <- bptry(bplapply(list(1, "two", 3, 4), sqrt, BPPARAM = bpparam))
res
## Calling bpok() on the result list returns TRUE for elements with no error.
bpok(res)
## -----------------------------------------------------------------------
## Random number generation:
## -----------------------------------------------------------------------
## Random number generation is controlled with the 'RNGseed' field.
## This seed is passed to parallel::clusterSetRNGStream
## which uses the L'Ecuyer-CMRG random number generator and distributes
## streams to members of the cluster.
bpparam <- MulticoreParam(3, RNGseed = 7739465)
bplapply(seq_len(bpnworkers(bpparam)), function(i) rnorm(1), BPPARAM = bpparam)