Optional Type Specification in R
In the last year I have given a few talks on reflectance and the power
it gives us to integrate functionality from other languages and
systems into R (and other statistical systems).
Additionally, I have illustrated how we can make our functionality
available. The RDCOMServer and RDCOMEvents packages are examples
of how we can publish servers with methods implemented via
R functions. The SSOAP package is another example of this style of
service.
An important aspect that is missing from the way we publish servers is
information about the types of the arguments a method expects, what
type of object it will return and what sort of exceptions can it
raise1. Other client systems (Perl,
Python, Visual Basic) use this information in the same way we do to
provide compiled bindings to servers. While we can take advantage of
the type specification for servers, we do not provide it to others.
This is partly because R is a dynamic, run-time typed language. But
when we publish a server, we have definite expectations about its
inputs and its outputs. We may accept different combinations of
inputs, and this complicates matters. But often we expect a simple
set of options and the return type is an element of a small set of
possible types.
This information is also true when we are developing regular R
packages and even functions, rather than thinking about distributed
computing via client-server models and reflectance. Type checking is
a very valuable aspect of compiled languages. The compiler tells us
what we have gotten wrong before we run it and helps us to fix
things. Many of the errors are simple and amount to typographical
mistakes. Others however are more conceptual and the compiler helps
us to identify them.
We can perform type checking R by adding validation
code to the beginning of the function.
This can check the different combinations of inputs.
This is a luxury of run-time type discovery.
function(x, y, z) {
if(!(is(x, "integer") && is(y, "integer"))
&& !(is(x, "matrix") && missing(y) && is(z, "logical")))
stop("Expecting either an integer, integer or a matrix, , logical as inputs")
}
In addition to simple type-checking, we can perform
more content-based checking.
For example, we can verify that x and y have the same length.
While we can do these tests in each function, it would be more
convenient to have the interpreter do the tests itself and therefore
leave our functions more simple and easier to read. If we can
associate the expected types, then the interpreter can validate that
the arguments match the given signature.
We can simply describe the combinations of types using named vectors
signatures =
list(
c(x = "integer", y = "integer"),
c(x = "matrix", z = "logical")
)
When evaluating the function , the interpreter can
access this information and then compare the arguments.
validated = FALSE
for(i in signatures) {
if(all(sapply(names(i), function(el) is(get(el), i[el])))) {
validated = TRUE
break
}
}
if(!validated)
stop("Unexpected call signature")
We can associate the signatures with the function
by registering it as meta-data in some way. Attaching it via
attributes is one way. Writing it to a meta-data object is another
that separates the function and the specification but is perfectly
fine.
We need to ensure that the signatures object is used. The function
cannot necessarily see itself.
It can use sys.function to access itself:
attr(sys.function(), "signatures")
and then it can validate the call.
Of course, it is nicer if the interpreter does.
It leaves the original code as-is without requiring
the validation be part of the body.
And if we think about it, the return value is very
hard to do this way (without redefining the return function!).
When the interpreter evaluates a call to a function,
it needs to check if the function has type information.
If so, it validates this.
Assuming thre is no error it evaluates the
body of the function in the usual way.
When the function returns control normally,
i.e. by an implicit or explicit call to return
,
then the evaluator needs to check the return type against
those specified by the signatures.
We can do this rapidly in C code.
And this is probably the appropriate place to do this.
Alternatively, we can modify the do_eval routine
to pass control back to another R function
that performs these steps.
We can, in fact, do both by making
the internal SEXP have a pointer
to a routine that actually
performs the call, along with the validation.
We can also attach information about the return type of the function.
Each of the signatures has an expected target type,
or collection of possible types. These might be based
on the actual content of the material, but this is not
very common.
The S4 mechanism
allows us to do this (although we do have to make functions generic to
use it). This amounts to specifying a signature along with a return
type.
-
- Lazy evaluation is an issue that we have to deal with.
Basically, we cannot support lazy evaluation of the
arguments that referenced in the type specification signatures.
We may want to force the evaluation of all the
reference arguments in any of the signatures
or just the ones we go through to find if one matches.
The former gives a clear semantic; the latter
gives us unpredictable behavior but avoids overhead.
-
-
- We want two different ways to specify
the types.
One is as above, giving the signature
of a particular call.
The other is a set of the different
possible types for each individual argument.
For example, we might write
c(x = c("matrix", "integer", "A"),
y = c("logical", "character"))
And this would mean that we match
the class of x to any of those in the
signature type for that parameter,
and separately match the class of the argument
y to the types in that parameter specification.
In R code, this is
class(x) %in% signature$x && class(y) %in% signature$y
We can use a different class to represent this grid of
types rather than a list of simultaneous types.
-
-
- Keeping an eye on generating "GUI"s from functions,
we want to be able to specify whether a parameter
should be represented as a radio button,
or a checkbox.
Something that we did in another project is to have a method
for a "type" to construct the GUI element.
We would need to dispatch on the type and
the target context, e.g. a GUI toolkit (RGtk, tcltk), HTML
form.
An enumeration would map to a radio button.
A vector would map to checkboxes so that one could
specify several values.
(We also need to have a ScalarInteger, etc. so that it says that
the input is a element rather than a vector.)
We should look at XForms and
HTML forms , and InfoPath (and .xsd) and for information. But it is clear we
need to have some facility to specify a quantity.
Schema also come to mind, as do contracts.
-
-
- How to handle ...?
- Robert argues if we are inclined to put
type information on an "argument" that is not a formal
argument/parameter, then that argument should become a parameter.
We have to handle the cases of legacy code. So if the author of
the method.
-
- We can allow the signatures to be class names or
expressions, or functions.
Then these can do dynamic/content checking.
-
We might like the return value to be ifelse( x < 1, "A", "B").
See ff in tests/return.S
-
- handling return() is problematic at the S-language level.
We need to be able to check the return value after it has been
or as it is being returned but before control is returned to the
caller of the function. There is a nice trick we can use
which is to explicitly call checkReturnValue() with the return value,
and a very special expression return(value).
Also we want the signature that was matched in the call to checkArgs()
-
-
- Use Luke's
codetools package
to also analyze a function for free variables, etc.
-
-
-
PCCTS/Antlr parser generator to provide an extended R syntax
that maps to the existing internal data structures.
-
This is complicated, but potentially better.
It allows the type information to be specified within the
definition of the functions itself.
Expressions like
foo =
function(matrix x, integer y, z) {
}
Unfortunately this may not
-
- Make a checkArgs() function available so that the
author of a function can invoke the type checking. Otherwise,
we don't check. And the checkArgs() can be called with no arguments,
character vector of argument names, or the argument sublist itself to froce
the evaluation of the arguments.
checkArgs() will reach into the calling frame of the function to be
checked,
get the meta-data for the type info and compare the arguments in the
frame as appropriate.
-
-
Additional information would be nice like estimates of how
long it will take, how much memory will it consume for given inputs,
what sort of machine is it being run on, etc. but these are not
intrinsically code-related but rather server-based.)
Duncan Temple Lang
<[email protected]>
Last modified: Fri Sep 30 07:20:21 PDT 2005