-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Derivatives for entire functions #7
Comments
I just do a symbolic rewrite of expressions, following the basic arithmetic rules and the chain rule. Nothing fancy is going on; the package was mostly intended as an example of manipulating expressions. So maybe the README shouldn't say automatic differentiation when I really mean symbolic... General functions won't be possible to deal with when taking this approach. I wouldn't know where to begin with symbolically rewriting a loop or an if-statement. Something like your example, though, should be simple enough. It is not substantially harder to write a function that gives you the derivative of a vector function since you just need to handle it component-wise. Likewise, it wouldn't be difficult to differentiate a vector function to get gradients. The hardest task I can see would be figuring out the limitations on what it can do and communicating that clearly to a user... If you could check that a function is purely functional, you could substitute variables for values everywhere and use the current functionality that would handle using variables. You couldn't do your exact example since you modify f <- function(x) {
foo <- a + x[1] * b
bar <- x[2] * foo
c(foo, bar)
} which would be rewritten as f <- function(x) {
c(a + x[1] * b, x[2] * (a + x[1] * b)
} and this you could differentiate to get the Jacobi matrix If the expressions have side-effects (and the difficult part will be to allow the user to call functions and at the same time ensure that they don't), then this rewrite will mess everything up. |
I want to use AD to calculate the jacobi matrices/functions for my package paropt. ode <- function() {
additional_code(
if(t > 10) {
a = sin(t)
} else {
a = cos(t)
}
)
odesystem(
dpreydt = prey*predator*c - predator*d,
dpredatordt = prey*a - prey*predator*b
)
} I have no idea how to handle for loops. If statements could be just ignored I guess, at least in the additional_code section. Not entirely sure about it. What do you think about this approach? I would actually prefer not to introduce a DSL, however this would mean to implement AD for R.... |
Sorry for the late reply. I looked at the code again. I don't know how to deal with Handling assignments in the functions shouldn't be a major issue either. It will involve catching a series of assignments before the main expression and substituting them in before rewriting. However, for an ODE, I am not sure what the purpose is. There, wouldn't you need to integrate rather than differentiate? That part is far from trivial, and I wouldn't even know where to begin for symbolic integration... |
No Problem. Regarding the Assignment should be pretty straightforward. The jacobi matrix is used during the numerical solving of an ode system. The nonlinear system occurring during each step (if using an implicit method which is best for stiff ode systems) is solved e.g. by a newton solver. Therefore, the jacobi matrix is required to solve the system. If the user does not define a function for the jacobi matrix is predicted using finite differences. This requires n evaluation of the ode-function, where n is the number of states in the ode-system. Beyond, that it is not an exact solution. That is the use case I'm thinking of. Do you think it is realistic to use dfdr for that approach? |
I think something like that is doable. For f <- function(x, y) {
a <- x*y
b <- (x - y)^2
c(a*b, a/b)
} would be translated into f <- function(x, y) c((x*y)*((x-y)^2), (x*y)/((x - y)^2)) and > gradient(f)
function (x, y)
c(c(y * ((x - y)^2) + (x * y) * (2 * (x - y)), (y * ((x - y)^2) -
(x * y) * (2 * (x - y)))/((x - y)^2)^2), c(x * ((x - y)^2) +
(x * y) * (2 * (x - y) * -1), (x * ((x - y)^2) - (x * y) *
(2 * (x - y) * -1))/((x - y)^2)^2)) But that substitution is impossible with f <- function(x, y) {
a <- if (x < y) x*y else (x - y)^2
c(a * x, a / x)
} or something like that because we don't know how to substitute It might be possible to deal with by generating the derivative for all combinations of branch paths, but that would be some pretty big functions, I fear. Anyway, I will implement the Jacobi matrix this week, and if I have time look into variable assignments and substitutions. |
Yes, you are right it is not enough to only warn the user (I was a bit too optimistic). I will think about the solution with the derivative for all combinations. However, this is probably impossible if the If I find a bit of time I will check for some of the remaining functions: Or is it possible to add your own functions to the function |
Dealing with I haven't thought too hard about how to add functions that the package can handle. It does look at the function environment, and I think I had the thought that it could either find things there or differentiate calls using the chain rule and differentiating called functions itself, but it breaks for builtin functions that it doesn't know about. That being said, if we have a list that maps functions to their derivatives, we could look them up there. With that approach, we could add the builtin functions in the package, a user could add his own, and if we run into a function that doesn't have a derivative we could apply |
I would suggest giving up on the idea with The approach with the central list is nice. I hope to find time this week to add the 'list approach'. Maybe we could also handle the trigonometric/math functions with the list. |
What about this design for the function - derivative pairs: fcts <- setClass("fcts",
slots = list(
f = "character",
dfdx = "character"
)
)
setGeneric(
name = "add_fct",
def = function(obj, f_new, dfdx_new) {
standardGeneric("add_fct")
}
)
setMethod(
f = "add_fct",
signature = "fcts",
definition = function(obj, f_new, dfdx_new) {
obj@f = c(obj@f, f_new)
obj@dfdx = c(obj@dfdx, dfdx_new)
return(obj)
}
)
f <- fcts(
f = "sin",
dfdx = "cos"
)
f <- add_fct(f, "cos", "-sin")
f |
I like it. But would it be better to have a list where we can map names to functions and their derivatives? It would give us a faster lookup, we could immediately use the functions (I think). fct <- setClass("fct",
slots = list(f = "function", dfdx = "function")
)
fcts <- setClass("fcts", slots = c(funs = "list"))
setMethod("initialize",
signature = "fcts",
def = function(.Object) {
.Object@funs <- list()
.Object
}
)
setGeneric(
name = "add_fct",
def = function(obj, name, f_new, dfdx_new) standardGeneric("add_fct")
)
setGeneric(
name = "get_fct",
def = function(obj, name) standardGeneric("get_fct")
)
setGeneric(
name = "get_derivative",
def = function(obj, name) standardGeneric("get_derivative")
)
setMethod(
f = "add_fct",
signature = "fcts",
definition = function(obj, name, f_new, dfdx_new) {
obj@funs[[name]] = fct(f=f_new, dfdx=dfdx_new)
obj
}
)
setMethod(
f = "get_fct",
signature = "fcts",
definition = function(obj, name) obj@funs[[name]]
)
setMethod(
f = "get_derivative",
signature = "fcts",
definition = function(obj, name) obj@funs[[name]]@dfdx
)
f <- fcts()
f <- add_fct(f, "sin", sin, cos)
f <- add_fct(f, "cos", cos, \(x) -sin(x))
f
get_fct(f, "sin")
get_derivative(f, "sin")
get_derivative(f, "cos") Of course, I am not entirely sure if we need the derivative as an expression rather than a symbol. But something like the string "-sin" would be hard to incorporate in the rewrite. |
yes, this is true. Using character was not the best way to go. I wasn't aware that S4 has the type "function" 🙈 |
Ok, I added the list in my local repo. However, I'm struggling with creating the call in Could you give me a hint how to implement it. Thanks diff_built_in_function_call <- lift(function(expr, x, fl) {
# chain rule with a known function to differentiate. df/dx = df/dy dy/dx
y <- call_arg(expr, 1)
dy_dx <- diff_expr(call_arg(expr, 1), x, fl)
name <- call_name(expr)
fct <- body(get_derivative(fl, name))
print(class(fct))
#*dy_dx
}) |
I don't have time to integrate this (nor really to test if it will always work), but I would do it something like this. You need to apply the chain rule, so you need to call the function derivative and then multiply with the derivative of the arguments: f <- function(x) x^2
testmap = c(
"f" = \(x) 2*x,
"sin" = cos
)
get_derivative <- function(fname) testmap[[fname]] It would, of course, be something like the S4 class, but this is just for testing. We can get the derivative using the > get_derivative("f")
\(x) 2*x
<bytecode: 0x7ff5336bb588>
> get_derivative("sin")
function (x) .Primitive("cos") Notice that there is a difference between primitive and non-primitive functions, which is something we have to deal with. Anyway, we don't need to modify the body of these functions or anything. When we call Something like this should work: call_derivative_expr <- function(expr, x, e) {
fname <- call_name(expr)
func <- fname |> get_derivative() |> purrr::when(
is.primitive(.) ~ . |> deparse() |> str2lang(), # Use the .Primitive as function
~ bquote( get_derivative( .(as.character(fname)) ) ) # Look up derivative
)
rlang::new_call(func, call_args(expr))
} We can test it: > # Testing hacks
> (ff <- rlang::new_function(alist(x=), call_derivative_expr(quote(f(x)), x)))
function (x)
get_derivative("f")(x)
> (gg <- rlang::new_function(alist(x=), call_derivative_expr(quote(sin(x)), x)))
function (x)
.Primitive("cos")(x) It is not that pretty--it would perhaps be nicer to write With that, we can differentiate (with some mockup code) like this: diff_expr <- function(expr, x, e) quote(2 * x) # Mock up
diff_built_in_function_call <- function(expr, x, e) {
# chain rule with a known function to differentiate. df/dx = df/dy dy/dx
y <- call_arg(expr, 1)
df_dy <- call_derivative_expr(expr, x, e)
dy_dx <- diff_expr(y, x, e)
bquote( .(df_dy) * .(dy_dx) )
} The result will look like this: > diff_built_in_function_call(quote(f(x^2)), x, NULL)
get_derivative("f")(x^2) * (2 * x) Or you can make a function out of it like this: df <- rlang::new_function(alist(x=), diff_built_in_function_call(quote(f(x^2)), x, NULL)) Does any of that make sense? |
I added a pull request with a working lookup in the fl list. The list contains know only a name for a function and a function defining the derivative. For instance the function for sin is: function(x) bquote(cos(.(x))) Do we still need |
There are two tests failing now. One is error handling, so ignore that. The other is an example I think is worth dealing with. f <- function(x, y) x^2 * y
g <- function(z) f(2*z, z^2) Differentiating That is what the general function computes. Just not well, I admit. But it was an attempt. It would call With the new list, we can get the derivative directly, so we should be able to substitute it in. It solves that part. But it doesn't eliminate the need for the chain rule when the arguments to a function are expressions. It looks simple to update the function: instead of getting the general function from the environment, we look it up in the list, and we get the derivative from there as well. With substitutions we might even be able to speed up calculations; we could substitute expressions in for the variables in the function body and the simplify. We have a design decision to make, however: should we make the user add new functions to the list every time they use a function? Otherwise, I don't know what the right design is, I haven't thought about it enough--I'm sorry, but I'm thinking a bit slow these days as I'm super busy with other tasks--but I do think that something that handles general calls is necessary. Otherwise, I don't see how you handle cases like the example above. |
I prefer to force the user to add the derivative of
That is also true for me. Therefore, the code on Friday was not optimal... The next step is to work on functions that get more than one argument. Currently, this is not working. I have a very messy solution, which I will improve now |
I'm agnostic on whether it is best to explicitly adding One problem I see with handling multiple arguments is that you cannot get the number of arguments from the primitives. From the others you can, and you could easily insert the gradient in the list instead of what we do now with just a single argument. The |
r <- args(sin)
formalArgs(r) |> length() This works also for primitives I think. |
Ah, maybe it does now. It's been a while since I worked with that :) |
I'm still working on the function
fct <- setClass(
"fct",
slots = list(
name = "character",
dfdx = "function",
name_deriv = "character"
)
)
diff_built_in_function_call <- lift(function(expr, x, fl) {
# chain rule with a known function to differentiate. df/dx = df/dy dy/dx
name <- call_name(expr)
name_deriv <- get_derivative_name(fl, name)
len <- length(expr)
args <- sapply(seq_along(2:len), function(x) call_arg(expr, x))
dy_dx <- sapply(args, function(as) diff_expr(as, x, fl) )
outer_deriv <- do.call(call, c(name_deriv, args), quote = TRUE)
entire_deriv <- NULL
for(i in seq_along(dy_dx)) {
inner_deriv <- dy_dx[[i]]
entire_deriv = c(entire_deriv, bquote(.(inner_deriv) * .(outer_deriv)) )
}
for(i in seq_along(entire_deriv)) {
deriv_current <- entire_deriv[[i]]
deriv_current <- simplify_expr(deriv_current)
if(deriv_current == 0) {
entire_deriv[[i]] <- NA
} else {
entire_deriv[[i]] <- deriv_current
}
}
entire_deriv <- entire_deriv[!is.na(entire_deriv)]
if(len > 2) {
entire_deriv <- paste(entire_deriv, collapse = "+")
}
str2lang(entire_deriv)
}) Is it really true that I need more than one derivative function if the original function receives more than one argument? |
For a multivariate function so the chain rule wants the sum of the partial derivatives It doesn't really matter if the For general functions where I don't think we need to consider gradients and Jacobi matrices as separate cases; they are just one versus more-than-one dimensions. And a univerate function If we say that the derivative we store in |
Regarding the jacobi matrix, I would propose using a vector-based function as primary case. For instance: f <- function(x) {
y = numeric(length(x))
y[1] = x[1]^2
y[2] = x[2]*3
return(y)
} Could you give me an example where this approach does not work? |
I have pondered over the jacobian function. My approach would have the following design: extractast <- function(code) {
if(!is.call(code)) {
return(code)
}
fct <- code[[1]]
if( (as.name("<-") == fct) || (as.name("=") == fct) ) {
} else if(as.name("[") == fct) {
}
code <- as.list(code)
lapply(code, extractast)
}
#' Compute the jacobian-function of a function.
#'
#' Creates a function that computes the jacobian matrix of a function with respect to one parameter
#' and returns a matrix of these.
#'
#' @param f A function
#' @param y The variable to compute the derivative for. \eq{dydx = ...}
#' @param x defines the dependent variable for which the variable shall be calculated. \eq{dxdx = ...}
#' @return A function that computes the jacobian of f at any point.
#' @export
jacobian <- function(f, y, x) {
stopifnot("the function is missing"=!is.null(f))
stopifnot("the variable y is missing"=!is.null(y))
stopifnot("the variable x is missing"=!is.null(x))
brackets <- body(f)[[1]]
body <- NULL
if(brackets != as.name("{")) {
body <- body(f)
} else {
body <- body(f)[2:length(body(f))]
}
for(i in seq_along(body)) {
ast <- extractast(body[[i]])
}
}
f <- function(t, x) {
y <- numeric(length(x))
a <- 2*x[2]
if(t < 10) {
a <- 3*x[2]
}
y[1] = x[1]^2
y[2] = a
return(y)
}
# this results in:
jac <- function(t, x) {
jac_mat <- matrix(0, length(x), length(x))
y <- numeric(length(x))
a <- 2*x[2]
if(t < 10) {
a <- 3*x[2]
}
y[1] = x[1]^2
y[2] = a
jac_mat[1, 1] = 2*x[1]
jac_mat[2, 1] = 0
jac_mat[1, 2] = 2
jac_mat[2, 2] = 0
return(jac_mat)
} What do you think about this approach? |
I am confused about what you are doing here. If where The To get the Jacobian, we just need a matrix that for each row has the If we can already compute the gradient of a multi-variate function, perhaps restricted to some input parameters, we just need to compute this for all the output variables. My thinking was that if a function returns a vector (in some form using f <- function(x) {
...
c(expr1, expr2, ..., exprm)
} then we just needed to get the gradient for each or Does that make sense? |
I had in mind that the R code has to be translated to C++ code via my package ast2ast. I have to think about a solution for the problem.... although it is not directly related to dfdr. So what is than missing for dfdr? |
Since this function would generate an R function with all the |
This approach is very nice. However, I think it will not work if it gets a vector as input which is subsetted. How would we substitute these cases? f <- function(x) {
y[1] = ...
....
} And we cannot use (at least the current version of 'df') for multiline code |
If The current version cannot deal with multiple expressions, no, but it would not be difficult to deal with a limited number of assignments in the body. The problem is only if you assign to a variable that you then use in multiple expressions. Then you need to apply the chain rule everywhere, and keeping track of that could be difficult. Not impossible, of course, you could do all the substitutions before the differentiation and you would end up the right place. But anything like a loop where you assign for (i in ...) {
y[i] = x[i-1] * x[i + 1] * ... something
} would be difficult to track. If we restrict to assignments to constants and lists/arrays, I don't think it will be that difficult. |
Sorry for the confusion with my last approach, I had the same in mind as you just described. However, the code was too complicated for this problem. I would restrict the user to constant vectors. Should the code for this problem be included in 'dfdr' or should I integrate it in paropt or ast2ast? |
It could go either way. I think I can already handle the case with f <- function(x, y, z) {
c(x*y*z, x-x^y, x*y / z, ...)
} (unless I forgot to commit it). Changing it to f <- function(x) { # x is now a vector
c(x[1]*x[2]*x[3], x[1] - x[1]^x[2], x[1] * x[2] / x[3])
} shouldn't be too difficult, if we require that the index in The difficult part would be if we attempted both generic variable Anyway, I think it could go in both packages, and it would probably be with the same difficulty. |
I would suggest integrating the code into 'dfdr' as not too much further work is needed to finish it. I would define the following restrictions:
Then it would be fairly easy to implement it. First, we had to check each line and find the variables on the left and right sides. If we find something different from y (the vector returned from the function) the variable has to be replaced within the following lines. Secondly, Should I implement it? |
That sounds like reasonable restrictions to me. We can always worry about vector output at another time. It would be great if you have time to implement it. I, unfortunately, am a bit overworked at the moment. |
I'm working on it. Is there a way to call 'd' for x[1]? Currently, all the index calls are replaced with e.g. x1 etc. Actually, it is also possible to use if, else if and else. The function is than a piece wise defined function (this is the way Mathematica and the wolfram language handle it). It is only necessary to keep the original code and add additional the derivatives. |
Not right now, but it is possible to get. It requires that we change the way the parse_derivative_var <- function(x) {
# capture x as a call
x <- rlang::enexpr(x)
if (rlang::call_name(x) == "[") {
# x is a reference into an array, and we can get the array
# and the index
structure(rlang::call_args(x), names = c("array", "index"))
} else {
x # x is just the variable
}
}
> var <- parse_derivative_var(x[2])
> var
$array
x
$index
[1] 2 If you want to call it indirectly, so you have a function that parses an argument to such a lazy evaluation function, you need to capture the expression with # to preserve the lazy eval, using !!enexpr(x)
f <- function(x) {
# you cannot do parse_derivative_var(x)
# but you can parse along x using !!enexpr(x)
x <- parse_derivative_var(!!rlang::enexpr(x))
x
}
> f(x[4])
$array
x
$index
[1] 4 I don't know if we need that. We can, however, capture the df <- function(expr, x) {
# we just take the expr here. no checking in this example
x <- rlang::enexpr(x)
# testing hack
if (expr == x) print("equal") else print("unequal")
}
> df(quote(y[1]^2+y[2]), y[1])
[1] "unequal"
> df(quote(y[1]), y[1])
[1] "equal"
I think using the quoted expressions for derivatives (probably after checking that they are valid indices) should be safer.
I haven't used Mathematica, but if it works there, then that is a great way to do it :) |
Now we can call All tests run correctly except the Error handling test. I'm a bit puzzled about the purpose of |
I think my thought was simply that the differentiation function should throw an error if it encountered a function it didn't know how to deal with. We cannot use the chain rule unless we can differentiate the function calls in the expressions, so it shouldn't silently accept such expressions. I think the right solution is to require that any function called in the expressions should be found in the list we have of derivatives. Wouldn't that work? Isn't that the same as what you are saying? |
ok, perfect that is exactly what I thought. I will remove the function and just throw an error if any function is found not included in |
Sounds perfect! |
Most of the function to create the jacobian function is already implemented and I have added a bunch of tests. See the last pull request. However, I have found an issue that is a bit difficult to handle. Suppose the user creates a vector to store the derivatives. For example: Any idea how to handle this? In the context of the jacobian function, I would suggest that the line where |
I honestly don't know. Is the use-case something like this? f <- function(x) {
y <- numeric(2)
y[1] <- x**2
y[2] <- 2*x
y
} |
yes that is the case I was struggling with. |
The question is where this should be handled. One possibility would be that the function f <- function(x) x*rep(3.14, 1) In order to do this an additional attribute has to be added to the fct_deriv list. The other possibility is to handle this case only for the jacobian function. Meaning that the lines where functions such as jac <- function(x) {
y <- numeric(2) # allowed
y <- numeric(2)*x[1] # not allowed
} I personally prefer the first approach. As the second one is complicated to explain to the user and adds additional restrictions. What is your opinion? |
I agree that a warning is probably the best solution. We definitely do not want to fail without warnings. |
Could the approach in the function 'd' be used for more complicated functions. For instance:
Is it actually automatic differentiation in forward or reverse mode?
The text was updated successfully, but these errors were encountered: