Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Derivatives for entire functions #7

Open
Konrad1991 opened this issue Oct 20, 2022 · 43 comments
Open

Derivatives for entire functions #7

Konrad1991 opened this issue Oct 20, 2022 · 43 comments

Comments

@Konrad1991
Copy link
Owner

Could the approach in the function 'd' be used for more complicated functions. For instance:

f <- function(y) {
y[1] = a + y[1]*b
y[2] = y[2]*y[1]
return(y)
}

Is it actually automatic differentiation in forward or reverse mode?

@mailund
Copy link
Collaborator

mailund commented Oct 21, 2022

I just do a symbolic rewrite of expressions, following the basic arithmetic rules and the chain rule. Nothing fancy is going on; the package was mostly intended as an example of manipulating expressions. So maybe the README shouldn't say automatic differentiation when I really mean symbolic...

General functions won't be possible to deal with when taking this approach. I wouldn't know where to begin with symbolically rewriting a loop or an if-statement. Something like your example, though, should be simple enough. It is not substantially harder to write a function that gives you the derivative of a vector function since you just need to handle it component-wise. Likewise, it wouldn't be difficult to differentiate a vector function to get gradients.

The hardest task I can see would be figuring out the limitations on what it can do and communicating that clearly to a user...

If you could check that a function is purely functional, you could substitute variables for values everywhere and use the current functionality that would handle using variables. You couldn't do your exact example since you modify y, but you could do

f <- function(x) {
  foo <- a + x[1] * b
  bar <- x[2] * foo
  c(foo, bar)
}

which would be rewritten as

f <- function(x) {
  c(a + x[1] * b, x[2] * (a + x[1] * b)
}

and this you could differentiate to get the Jacobi matrix

$\frac{\partial f}{\partial x} = \left( \matrix{\partial f_1/\partial x_1 &amp; \partial f_1/\partial x_2 \cr \partial f_2/\partial x_1 &amp; \partial f_2/\partial x_2} \right)$

If the expressions have side-effects (and the difficult part will be to allow the user to call functions and at the same time ensure that they don't), then this rewrite will mess everything up.

@Konrad1991
Copy link
Owner Author

Konrad1991 commented Oct 21, 2022

I want to use AD to calculate the jacobi matrices/functions for my package paropt.
Currently, the user specifies the ode system within R. Afterwards, the R function is translated to C++ using ast2ast. So it would be super cool, if the jacobi matrices are calculated automatically. This is much more efficient than using finite difference methods. My idea was to extend dfdr thus it supports all the functions in ast2ast. Moreover, it is necessary to define a specific DSL that has to be used within paropt. For example:

ode <- function() {
 
  additional_code(
    if(t > 10) {
      a = sin(t)
    } else {
     a = cos(t)
   }
  )
  
  odesystem(
    dpreydt = prey*predator*c - predator*d,
    dpredatordt = prey*a - prey*predator*b  
  )

}

I have no idea how to handle for loops. If statements could be just ignored I guess, at least in the additional_code section. Not entirely sure about it.

What do you think about this approach? I would actually prefer not to introduce a DSL, however this would mean to implement AD for R....

@mailund
Copy link
Collaborator

mailund commented Oct 25, 2022

Sorry for the late reply.

I looked at the code again. I don't know how to deal with if-statements as they tend to make non-differential functions, but it isn't a problem to handle functions from vectors to vectors, I think. It can do most of that already. Then, getting the Jacobi matrix is straightforward.

Handling assignments in the functions shouldn't be a major issue either. It will involve catching a series of assignments before the main expression and substituting them in before rewriting.

However, for an ODE, I am not sure what the purpose is. There, wouldn't you need to integrate rather than differentiate? That part is far from trivial, and I wouldn't even know where to begin for symbolic integration...

@Konrad1991
Copy link
Owner Author

No Problem.

Regarding the ìf statement it might be a good idea to throw a warning telling the user that it might be risky to use them within the ode system....

Assignment should be pretty straightforward.

The jacobi matrix is used during the numerical solving of an ode system. The nonlinear system occurring during each step (if using an implicit method which is best for stiff ode systems) is solved e.g. by a newton solver. Therefore, the jacobi matrix is required to solve the system. If the user does not define a function for the jacobi matrix is predicted using finite differences. This requires n evaluation of the ode-function, where n is the number of states in the ode-system. Beyond, that it is not an exact solution. That is the use case I'm thinking of.

Do you think it is realistic to use dfdr for that approach?

@mailund
Copy link
Collaborator

mailund commented Oct 25, 2022

I think something like that is doable. For if-statements, though, I don't think a warning can do it. The easy way to handle assignments is as substitutions such that a function like this

f <- function(x, y) {
  a <- x*y
  b <- (x - y)^2
  c(a*b, a/b)
}

would be translated into

f <- function(x, y) c((x*y)*((x-y)^2), (x*y)/((x - y)^2))

and

> gradient(f)
function (x, y) 
c(c(y * ((x - y)^2) + (x * y) * (2 * (x - y)), (y * ((x - y)^2) - 
    (x * y) * (2 * (x - y)))/((x - y)^2)^2), c(x * ((x - y)^2) + 
    (x * y) * (2 * (x - y) * -1), (x * ((x - y)^2) - (x * y) * 
    (2 * (x - y) * -1))/((x - y)^2)^2))

But that substitution is impossible with

f <- function(x, y) {
  a <- if (x < y) x*y else (x - y)^2
  c(a * x, a / x)
}

or something like that because we don't know how to substitute a if we don't know what it is. And substitution is the easiest way to handle it, as I don't know what the chain rule is for if.

It might be possible to deal with by generating the derivative for all combinations of branch paths, but that would be some pretty big functions, I fear.

Anyway, I will implement the Jacobi matrix this week, and if I have time look into variable assignments and substitutions.

@Konrad1991
Copy link
Owner Author

Yes, you are right it is not enough to only warn the user (I was a bit too optimistic). I will think about the solution with the derivative for all combinations. However, this is probably impossible if the if term depends on input parameter as the type is not known.

If I find a bit of time I will check for some of the remaining functions: [, cmr (catmull rome spline), vector, matrix etc.

Or is it possible to add your own functions to the function d,? I saw that it is looking for functions in a specific environment. But I didn't get the code completely of diff_general_function_call.

@mailund
Copy link
Collaborator

mailund commented Oct 27, 2022

Dealing with if statements will certainly also require restrictions to which subset of R you can use for the functions, e.g. making assumptions about types and the domain of the function. I can only imagine that it will get messy :)

I haven't thought too hard about how to add functions that the package can handle. It does look at the function environment, and I think I had the thought that it could either find things there or differentiate calls using the chain rule and differentiating called functions itself, but it breaks for builtin functions that it doesn't know about.

That being said, if we have a list that maps functions to their derivatives, we could look them up there. With that approach, we could add the builtin functions in the package, a user could add his own, and if we run into a function that doesn't have a derivative we could apply d() to it to get it and then add it. Something like that could work, I think.

@Konrad1991
Copy link
Owner Author

I would suggest giving up on the idea with if :(

The approach with the central list is nice. I hope to find time this week to add the 'list approach'. Maybe we could also handle the trigonometric/math functions with the list.

@Konrad1991
Copy link
Owner Author

What about this design for the function - derivative pairs:

fcts <- setClass("fcts",
    slots = list(
      f = "character",
      dfdx = "character"
    )
)

setGeneric(
  name = "add_fct",
  def = function(obj, f_new, dfdx_new) {
    standardGeneric("add_fct")
  }
)

setMethod(
  f = "add_fct",
  signature = "fcts",
  definition = function(obj, f_new, dfdx_new) {
    obj@f = c(obj@f, f_new)
    obj@dfdx = c(obj@dfdx, dfdx_new)
    return(obj)
  }
)

f <- fcts(
  f = "sin",
  dfdx = "cos"
)

f <- add_fct(f, "cos", "-sin")
f

@mailund
Copy link
Collaborator

mailund commented Oct 27, 2022

I like it. But would it be better to have a list where we can map names to functions and their derivatives? It would give us a faster lookup, we could immediately use the functions (I think).

fct <- setClass("fct",
    slots = list(f = "function", dfdx = "function")
)
fcts <- setClass("fcts", slots = c(funs = "list"))
setMethod("initialize",
          signature = "fcts",
          def = function(.Object) {
              .Object@funs <- list()
              .Object
          }
)

setGeneric(
  name = "add_fct",
  def = function(obj, name, f_new, dfdx_new) standardGeneric("add_fct")
)
setGeneric(
  name = "get_fct",
  def = function(obj, name) standardGeneric("get_fct")
)
setGeneric(
  name = "get_derivative",
  def = function(obj, name) standardGeneric("get_derivative")
)

setMethod(
  f = "add_fct",
  signature = "fcts",
  definition = function(obj, name, f_new, dfdx_new) {
    obj@funs[[name]] = fct(f=f_new, dfdx=dfdx_new)
    obj
  }
)

setMethod(
  f = "get_fct",
  signature = "fcts",
  definition = function(obj, name) obj@funs[[name]]
)
setMethod(
  f = "get_derivative",
  signature = "fcts",
  definition = function(obj, name) obj@funs[[name]]@dfdx
)

f <- fcts()
f <- add_fct(f, "sin", sin, cos)
f <- add_fct(f, "cos", cos, \(x) -sin(x))
f

get_fct(f, "sin")
get_derivative(f, "sin")
get_derivative(f, "cos")

Of course, I am not entirely sure if we need the derivative as an expression rather than a symbol. But something like the string "-sin" would be hard to incorporate in the rewrite.

@Konrad1991
Copy link
Owner Author

yes, this is true. Using character was not the best way to go. I wasn't aware that S4 has the type "function" 🙈

@Konrad1991
Copy link
Owner Author

Ok, I added the list in my local repo. However, I'm struggling with creating the call in diff_built_in_function_call.

Could you give me a hint how to implement it. Thanks

diff_built_in_function_call <- lift(function(expr, x, fl) {
  # chain rule with a known function to differentiate. df/dx = df/dy dy/dx
  y <- call_arg(expr, 1)
  dy_dx <- diff_expr(call_arg(expr, 1), x, fl)
  name <- call_name(expr)
  fct <- body(get_derivative(fl, name))
  print(class(fct))
  #*dy_dx
})

@mailund
Copy link
Collaborator

mailund commented Oct 28, 2022

I don't have time to integrate this (nor really to test if it will always work), but I would do it something like this.

You need to apply the chain rule, so you need to call the function derivative and then multiply with the derivative of the arguments: $\mathrm{d}f/\mathrm{d}x = \mathrm{d}f/\mathrm{d}y \cdot \mathrm{d}y/\mathrm{d}f$. In the code, that means that we need to get the expression for calling the derivative of $f$, $\mathrm{d}f/\mathrm{d}y$. Let's say we can look that up in some table. I'm going to muck up one that looks like this:

f <- function(x) x^2
testmap = c(
  "f" = \(x) 2*x,
  "sin" = cos
)
get_derivative <- function(fname) testmap[[fname]]

It would, of course, be something like the S4 class, but this is just for testing. We can get the derivative using the get_derivative() function:

> get_derivative("f")
\(x) 2*x
<bytecode: 0x7ff5336bb588>
> get_derivative("sin")
function (x)  .Primitive("cos")

Notice that there is a difference between primitive and non-primitive functions, which is something we have to deal with.

Anyway, we don't need to modify the body of these functions or anything. When we call $\mathrm{d}f/\mathrm{d}y$, we just call $f'(y)$. So what we need to do is create that call. It only involves changing the name of the call, 'cause the arguments are already there.

Something like this should work:

call_derivative_expr <- function(expr, x, e) {
  fname <- call_name(expr)
  func <- fname |> get_derivative() |> purrr::when(
    is.primitive(.) ~ . |> deparse() |> str2lang(),       # Use the .Primitive as function
    ~ bquote( get_derivative( .(as.character(fname)) ) )  # Look up derivative
  )
  rlang::new_call(func, call_args(expr))
}

We can test it:

> # Testing hacks
> (ff <- rlang::new_function(alist(x=), call_derivative_expr(quote(f(x)), x)))
function (x) 
get_derivative("f")(x)
> (gg <- rlang::new_function(alist(x=), call_derivative_expr(quote(sin(x)), x)))
function (x) 
.Primitive("cos")(x)

It is not that pretty--it would perhaps be nicer to write cos(x) instead of .Primitive("cos")(x) and the lookup in the function call is ugly, but this will work. I think. It depends on the environment and whether the evaluation environment can find get_derivative(), but that is something that can be fixe.

With that, we can differentiate (with some mockup code) like this:

diff_expr <- function(expr, x, e) quote(2 * x) # Mock up
diff_built_in_function_call <- function(expr, x, e) {
  # chain rule with a known function to differentiate. df/dx = df/dy dy/dx
  y <- call_arg(expr, 1)
  df_dy <- call_derivative_expr(expr, x, e)
  dy_dx <- diff_expr(y, x, e)
  bquote( .(df_dy) * .(dy_dx) )
}

The result will look like this:

> diff_built_in_function_call(quote(f(x^2)), x, NULL)
get_derivative("f")(x^2) * (2 * x)

Or you can make a function out of it like this:

df <- rlang::new_function(alist(x=), diff_built_in_function_call(quote(f(x^2)), x, NULL))

Does any of that make sense?

@Konrad1991
Copy link
Owner Author

I added a pull request with a working lookup in the fl list. The list contains know only a name for a function and a function defining the derivative. For instance the function for sin is:

function(x) bquote(cos(.(x)))

Do we still need diff_general_function_call?

@mailund
Copy link
Collaborator

mailund commented Oct 31, 2022

There are two tests failing now. One is error handling, so ignore that. The other is an example I think is worth dealing with.

  f <- function(x, y) x^2 * y
  g <- function(z) f(2*z, z^2)

Differentiating f is no problem here, but g could be. With g, we have a call to f, and we can see what the derivative should be from the chain rule.

$\frac{\mathrm{d}g}{\mathrm{d}z} = \frac{\mathrm{d}f}{\mathrm{d}x}\frac{\mathrm{d}x^2y}{\mathrm{d}x} + \frac{\mathrm{d}f}{\mathrm{d}y}\frac{\mathrm{d}x^2y}{\mathrm{d}y}$

That is what the general function computes. Just not well, I admit. But it was an attempt. It would call d(f,x) and d(f,y) where these were needed, so as part of evaluating the expression, it would compute the derivative. Slow stuff here.

With the new list, we can get the derivative directly, so we should be able to substitute it in. It solves that part. But it doesn't eliminate the need for the chain rule when the arguments to a function are expressions.

It looks simple to update the function: instead of getting the general function from the environment, we look it up in the list, and we get the derivative from there as well. With substitutions we might even be able to speed up calculations; we could substitute expressions in for the variables in the function body and the simplify.

We have a design decision to make, however: should we make the user add new functions to the list every time they use a function? Otherwise, g wouldn't know where to find f. Or should we do it automatically, e.g. have a way to automatically insert f and its derivative compute using d? And does that even make sense if it is a function of multiple variables, in which case we need the derivative for each argument. (I think d already computes that as a non-documented feature, but should this be the design?)

I don't know what the right design is, I haven't thought about it enough--I'm sorry, but I'm thinking a bit slow these days as I'm super busy with other tasks--but I do think that something that handles general calls is necessary. Otherwise, I don't see how you handle cases like the example above.

@Konrad1991
Copy link
Owner Author

I prefer to force the user to add the derivative of f every time he calls d.

I'm sorry, but I'm thinking a bit slow these days as I'm super busy with other tasks

That is also true for me. Therefore, the code on Friday was not optimal...

The next step is to work on functions that get more than one argument. Currently, this is not working. I have a very messy solution, which I will improve now

@mailund
Copy link
Collaborator

mailund commented Nov 2, 2022

I'm agnostic on whether it is best to explicitly adding d(f) when you compute the derivative of f or whether it is automatically added, so I'm happy with both solutions.

One problem I see with handling multiple arguments is that you cannot get the number of arguments from the primitives. From the others you can, and you could easily insert the gradient in the list instead of what we do now with just a single argument. The d function can already compute the gradient, so it would could just be a list of variable-names to functions, I think.

@Konrad1991
Copy link
Owner Author

Konrad1991 commented Nov 2, 2022

r <- args(sin)
formalArgs(r) |> length()

This works also for primitives I think.

@mailund
Copy link
Collaborator

mailund commented Nov 2, 2022

Ah, maybe it does now. It's been a while since I worked with that :)

@Konrad1991
Copy link
Owner Author

I'm still working on the function d. However, I'm still struggling to find a proper solution.

  • So I refactored the fct-class.
fct <- setClass(
  "fct",
  slots = list(
    name = "character",
    dfdx = "function",
    name_deriv = "character"
  )
)
  • beyond that I worked on the function to apply the chain rule to the individual functions
diff_built_in_function_call <- lift(function(expr, x, fl) {
  # chain rule with a known function to differentiate. df/dx = df/dy dy/dx
  name <- call_name(expr)
  name_deriv <- get_derivative_name(fl, name)
  len <- length(expr)
  args <- sapply(seq_along(2:len), function(x) call_arg(expr, x))
  dy_dx <- sapply(args, function(as) diff_expr(as, x, fl) )
  outer_deriv <- do.call(call, c(name_deriv, args), quote = TRUE)
  entire_deriv <- NULL
  for(i in seq_along(dy_dx)) {
    inner_deriv <- dy_dx[[i]]
    entire_deriv = c(entire_deriv, bquote(.(inner_deriv) * .(outer_deriv)) )
  }
  for(i in seq_along(entire_deriv)) {
    deriv_current <- entire_deriv[[i]]
    deriv_current <- simplify_expr(deriv_current)
    if(deriv_current == 0) {
      entire_deriv[[i]] <- NA
    } else {
      entire_deriv[[i]] <- deriv_current
    }
  }
  entire_deriv <- entire_deriv[!is.na(entire_deriv)]
  if(len > 2) {
    entire_deriv <- paste(entire_deriv, collapse = "+")
  }
  str2lang(entire_deriv)
})

Is it really true that I need more than one derivative function if the original function receives more than one argument?

@mailund
Copy link
Collaborator

mailund commented Nov 6, 2022

For a multivariate function $f(y_1,y_2,\ldots,y_n):\mathbb{R}^n\to \mathbb{R}$ the chain rule is

$$ \frac{\mathrm{d}}{\mathrm{d}x}f(y_1(x),y_2(x),\ldots,y_n(x)) = \sum_{i=1}^n \frac{\mathrm{d}f}{\mathrm{d}y_i} \frac{\mathrm{d}y_i}{\mathrm{d}x} $$

so the chain rule wants the sum of the partial derivatives $\frac{\mathrm{d}f}{\mathrm{d}y_i}$ multiplied by the derivative of the input functions $\frac{\mathrm{d}y_i}{\mathrm{d}x}$.

It doesn't really matter if the $y_i:\mathbb{R}\to\mathbb{R}$ functions take more than one argument here, $y_i:\mathbb{R}^m\to\mathbb{R}$, since we are just taking the derivative with respect to $x$, but if we want other partial derivatives we need those as well, e.g., $\frac{\mathrm{d}y_i}{\mathrm{d}x_j}$. But just for one variable, we at least need the gradient

$$ \nabla f = \left[ \frac{\mathrm{d}f}{\mathrm{d}y_1}, \frac{\mathrm{d}f}{\mathrm{d}y_2}, \ldots, \frac{\mathrm{d}f}{\mathrm{d}y_n} \right]^T $$

For general functions $f:\mathbb{R}^n\to\mathbb{R}^m$ and $g:\mathbb{R}^k\to\mathbb{R}^n$ we need the Jacobi matrices $J_f$ and $J_g$, and the chain rule says that

$$ J_{f\circ g}(\mathbf{x})=J_f(g(\mathbf{x}))\cdot J_g(\mathbf{x}) $$

where $\cdot$ is matrix multiplication.

I don't think we need to consider gradients and Jacobi matrices as separate cases; they are just one versus more-than-one dimensions. And a univerate function $f:\mathbb{R}\to\mathbb{R}$ just has a $1\times 1$ Jacobi matrix.

If we say that the derivative we store in fct gives us the Jacobi matrix, $J_f$, then it works exactly as now for univariate functions (it returns a scalar that we can consider a one-by-one matrix), but if we get gradients or Jacobi matrices, those would work as well. The only tricky part, as far as I can see it, is mapping variable names to entries in the Jacobi matrix. Since R can switch parameters around, variables do not necessarily have a position. But if the Jacobi matrix has named rows and columns, maybe that would take care of it?

@Konrad1991
Copy link
Owner Author

Regarding the jacobi matrix, I would propose using a vector-based function as primary case. For instance:

f <- function(x) {
    y = numeric(length(x))
    y[1] = x[1]^2
    y[2] = x[2]*3
    return(y)
}

Could you give me an example where this approach does not work?

@Konrad1991
Copy link
Owner Author

Konrad1991 commented Nov 8, 2022

I have pondered over the jacobian function.

My approach would have the following design:

extractast <- function(code) {
  
  if(!is.call(code)) {
    return(code)
  }
  
  fct <- code[[1]]
  
  if( (as.name("<-") == fct) || (as.name("=") == fct) ) {
    
  } else if(as.name("[") == fct) {
    
  }
  
  code <- as.list(code)
  lapply(code, extractast)
}

#' Compute the jacobian-function of a function.
#'
#' Creates a function that computes the jacobian matrix of a function with respect to one parameter
#' and returns a matrix of these.
#'
#' @param f  A function
#' @param y The variable to compute the derivative for. \eq{dydx = ...}
#' @param x defines the dependent variable for which the variable shall be calculated. \eq{dxdx = ...}
#' @return  A function that computes the jacobian of f at any point.
#' @export
jacobian <- function(f, y, x) {
  stopifnot("the function is missing"=!is.null(f))
  stopifnot("the variable y is missing"=!is.null(y))
  stopifnot("the variable x is missing"=!is.null(x))
  
  brackets <- body(f)[[1]]
  body <- NULL
  if(brackets != as.name("{")) {
    body <- body(f)  
  } else {
    body <- body(f)[2:length(body(f))]   
  }
  for(i in seq_along(body)) {
    ast <- extractast(body[[i]])
  } 
}
  • find assignments specific for 'y'
  • either only 'y' = .... or y[1], y[2] etc.
  • otherwise throw error
    *check that user defines continuously subsets:
    • y[1], y[2], y[3] --> ok
    • y[1], y[2], y[4] --> not ok
  • based on subset/brackets the jacobian is constructed

    • if only var is found --> 1x1 Jacobian matrix --> just call 'd' for 'x'
    • otherwise:
      • for each y[1], y[2] etc. is found:
        • calculate dy[1]dx[1], dy[1]dx[2], ... dy[n]dx[n]
        • calculate dy[n]dx[1], dy[n]dx[2], ... dy[n]dx[m]
        • m == n
  • find assignments not related to 'y'

    • substitute these assignments:
        1. e.g.: a = 10 --> just keep this line
        1. e.g.: a = x[1] --> this line has to be substituted for example in: y[1] = a^2
  • using this approach it is possible to add if statements:
    * if no 'if' statement is found within the code:
    * just call 'd' for each line
    * otherwise: evaluate the function and the derivatives. For instance:

f <- function(t, x) {
        y <- numeric(length(x))
        a <- 2*x[2]
        if(t < 10) {
          a <- 3*x[2]
        }
        y[1] = x[1]^2
        y[2] = a
        return(y)
 }
      # this results in:
 jac <- function(t, x) {
      jac_mat <- matrix(0, length(x), length(x))
      y <- numeric(length(x))
      a <- 2*x[2]
      if(t < 10) {
        a <- 3*x[2]
      }
      y[1] = x[1]^2
      y[2] = a
      
      jac_mat[1, 1] = 2*x[1]
      jac_mat[2, 1] = 0
      jac_mat[1, 2] = 2
      jac_mat[2, 2] = 0
      
      return(jac_mat)
 }

What do you think about this approach?

@mailund
Copy link
Collaborator

mailund commented Nov 9, 2022

I am confused about what you are doing here.

If $f:\mathbb{R}^n\to\mathbb{R}^m$ we can write

$f(\mathbf{x})=\left(f_1(\mathbf{x}), f_2(\mathbf{x}), \ldots, f_m(\mathbf{x})\right)$

where $\mathbf{x}$ is an $\mathbb{R}^n$ vector and $f_i:\mathbb{R}^n\to\mathbb{R}$.

The $f_i$ functions take vector input and a single real number as output, and we already have the code for computing the gradient of these

$\nabla f_i = \left(\frac{\partial f_i}{\partial x_1}, \frac{\partial f_i}{\partial x_2}, \ldots, \frac{\partial f_i}{\partial x_n}\right)$

To get the Jacobian, we just need a matrix that for each row has the $m$ different gradients

$$ \mathbf{J}(f) = \begin{bmatrix} \nabla f_1 \\ \nabla f_2 \\ \cdots \\ \nabla f_m \end{bmatrix} = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} & \cdots & \frac{\partial f_1}{\partial x_n}\\ \frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2} & \cdots & \frac{\partial f_2}{\partial x_n}\\ & \cdots \\ \frac{\partial f_m}{\partial x_1} & \frac{\partial f_m}{\partial x_2} & \cdots & \frac{\partial f_m}{\partial x_n} \end{bmatrix} $$

If we can already compute the gradient of a multi-variate function, perhaps restricted to some input parameters, we just need to compute this for all the output variables.

My thinking was that if a function returns a vector (in some form using c or list or such for its final result) then we could use the gradient code for each of the output parameters. E.g. if a function f returned

f <- function(x) {
    ...
    c(expr1, expr2, ..., exprm)
}

then we just needed to get the gradient for each or expr1, expr2, ..., exprm and make an n x m matrix out of it.

Does that make sense?

@Konrad1991
Copy link
Owner Author

I had in mind that the R code has to be translated to C++ code via my package ast2ast.
Thus, the approach with the derivative function works only on the R level.

I have to think about a solution for the problem.... although it is not directly related to dfdr.

So what is than missing for dfdr?
I would say that an important point now is to refacotr Rdiff_general_function_call.

@mailund
Copy link
Collaborator

mailund commented Nov 9, 2022

Since this function would generate an R function with all the $n\times m$ expressions for the Jacobi matrix, wouldn't it be possible to take that and generate the corresponding C++ code from the expressions?

@Konrad1991
Copy link
Owner Author

Konrad1991 commented Nov 9, 2022

This approach is very nice. However, I think it will not work if it gets a vector as input which is subsetted. How would we substitute these cases?

f <- function(x) {
   y[1] = ...
   ....
}

And we cannot use (at least the current version of 'df') for multiline code

@mailund
Copy link
Collaborator

mailund commented Nov 10, 2022

If y is the output vector, then we just differentiate the expression to the right of the assignment.

The current version cannot deal with multiple expressions, no, but it would not be difficult to deal with a limited number of assignments in the body. The problem is only if you assign to a variable that you then use in multiple expressions. Then you need to apply the chain rule everywhere, and keeping track of that could be difficult. Not impossible, of course, you could do all the substitutions before the differentiation and you would end up the right place. But anything like a loop where you assign

for (i in ...) {
   y[i] = x[i-1] * x[i + 1] * ... something
}

would be difficult to track.

If we restrict to assignments to constants and lists/arrays, I don't think it will be that difficult.

@Konrad1991
Copy link
Owner Author

Konrad1991 commented Nov 10, 2022

Sorry for the confusion with my last approach, I had the same in mind as you just described. However, the code was too complicated for this problem. I would restrict the user to constant vectors.

Should the code for this problem be included in 'dfdr' or should I integrate it in paropt or ast2ast?

@mailund
Copy link
Collaborator

mailund commented Nov 10, 2022

It could go either way. I think I can already handle the case with

f <- function(x, y, z) {
  c(x*y*z, x-x^y, x*y / z, ...)
}

(unless I forgot to commit it).

Changing it to

f <- function(x) { # x is now a vector
   c(x[1]*x[2]*x[3], x[1] - x[1]^x[2], x[1] * x[2] / x[3])
}

shouldn't be too difficult, if we require that the index in x[i] is always an integer. If it is a variable, there is more work involved, and we couldn't do the general case for that (since figuring out which variable x[i] is for any i is not computable). But if it is just a constant, we can do the derivative with expressions x[2] and check if an expression in the traversal matches quote(x[2]) the same way we match for variables now.

The difficult part would be if we attempted both generic variable function(x,y,z) and when some of them are vectors. For that, some sort of annotation would be necessary, I think.

Anyway, I think it could go in both packages, and it would probably be with the same difficulty.

@Konrad1991
Copy link
Owner Author

I would suggest integrating the code into 'dfdr' as not too much further work is needed to finish it.

I would define the following restrictions:

  • the user is only allowed to add scalar functions (new functions are not allowed to return vectors)
  • the user can only use integers within the [-function.
  • if, else if, else, for and while are not allowed

Then it would be fairly easy to implement it.

First, we had to check each line and find the variables on the left and right sides. If we find something different from y (the vector returned from the function) the variable has to be replaced within the following lines. Secondly, df can be called for each entry in the vector y

Should I implement it?

@mailund
Copy link
Collaborator

mailund commented Nov 11, 2022

That sounds like reasonable restrictions to me. We can always worry about vector output at another time. It would be great if you have time to implement it. I, unfortunately, am a bit overworked at the moment.

@Konrad1991
Copy link
Owner Author

I'm working on it.

Is there a way to call 'd' for x[1]?

Currently, all the index calls are replaced with e.g. x1 etc.
(it is checked whether x1 is used. In case the user has defined it another name is used)

Actually, it is also possible to use if, else if and else. The function is than a piece wise defined function (this is the way Mathematica and the wolfram language handle it). It is only necessary to keep the original code and add additional the derivatives.

@mailund
Copy link
Collaborator

mailund commented Nov 15, 2022

I'm working on it.

Is there a way to call 'd' for x[1]?

Not right now, but it is possible to get. It requires that we change the way the x parameter is evaluated. You need to capture it as an expression before it gets evaluated. You can do it with base R, but there are nice functions in rlang that handles it better. The enexpr() function gets an argument and returns a quoted version of it. That will translate the argument x[i] into the quoted expression.

parse_derivative_var <- function(x) {
  # capture x as a call
  x <- rlang::enexpr(x)
  if (rlang::call_name(x) == "[") {
    # x is a reference into an array, and we can get the array
    # and the index
    structure(rlang::call_args(x), names = c("array", "index"))
  } else {
    x # x is just the variable
  }
}

> var <- parse_derivative_var(x[2])
> var
$array
x

$index
[1] 2

If you want to call it indirectly, so you have a function that parses an argument to such a lazy evaluation function, you need to capture the expression with enexpr() and then pass it along with !!:

# to preserve the lazy eval, using !!enexpr(x)
f <- function(x) {
  # you cannot do parse_derivative_var(x)
  # but you can parse along x using !!enexpr(x)
  x <- parse_derivative_var(!!rlang::enexpr(x))
  x  
}

> f(x[4])
$array
x

$index
[1] 4

I don't know if we need that.

We can, however, capture the x argument and make it into an quoted expression that we can compare to other sub-expressions to determine when we have an index we are taking the derivative with.

df <- function(expr, x) {
  # we just take the expr here. no checking in this example
  x <- rlang::enexpr(x)
  
  # testing hack
  if (expr == x) print("equal") else print("unequal")
}

> df(quote(y[1]^2+y[2]), y[1])
[1] "unequal"
> df(quote(y[1]), y[1])
[1] "equal"

Currently, all the index calls are replaced with e.g. x1 etc. (it is checked whether x1 is used. In case the user has defined it another name is used)

I think using the quoted expressions for derivatives (probably after checking that they are valid indices) should be safer.

Actually, it is also possible to use if, else if and else. The function is than a piece wise defined function (this is the way Mathematica and the wolfram language handle it). It is only necessary to keep the original code and add additional the derivatives.

I haven't used Mathematica, but if it works there, then that is a great way to do it :)

@Konrad1991
Copy link
Owner Author

Now we can call d with x[1] etc. Furthermore, it is also possible to pass now x instead if "x" for argument x.

All tests run correctly except the Error handling test. I'm a bit puzzled about the purpose of diff_general_function_call.
I suggest that the function should check whether the used function exists in a specific list (e.g. vector, numeric, etc.). If the function is not in the list an error is thrown.

@mailund
Copy link
Collaborator

mailund commented Nov 16, 2022

I think my thought was simply that the differentiation function should throw an error if it encountered a function it didn't know how to deal with. We cannot use the chain rule unless we can differentiate the function calls in the expressions, so it shouldn't silently accept such expressions.

I think the right solution is to require that any function called in the expressions should be found in the list we have of derivatives. Wouldn't that work? Isn't that the same as what you are saying?

@Konrad1991
Copy link
Owner Author

ok, perfect that is exactly what I thought. I will remove the function and just throw an error if any function is found not included in fl.

@mailund
Copy link
Collaborator

mailund commented Nov 16, 2022

Sounds perfect!

@Konrad1991
Copy link
Owner Author

Most of the function to create the jacobian function is already implemented and I have added a bunch of tests. See the last pull request. However, I have found an issue that is a bit difficult to handle.

Suppose the user creates a vector to store the derivatives. For example: y <- numeric(2). This line is removed from the function d even when the user adds the function numeric to the list of known functions.

Any idea how to handle this?

In the context of the jacobian function, I would suggest that the line where numeric, vector etc. is found will be ignored.

@mailund
Copy link
Collaborator

mailund commented Dec 7, 2022

I honestly don't know. Is the use-case something like this?

f <- function(x) {
  y <- numeric(2)
  y[1] <- x**2
  y[2] <- 2*x
  y
}

@Konrad1991
Copy link
Owner Author

yes that is the case I was struggling with.

@Konrad1991
Copy link
Owner Author

The question is where this should be handled. One possibility would be that the function d would throw a warning if something like rep is encountered. The warning would tell the user that the function rep is not changed and has not been taken into account for calculating the derivatives. For instance:

f <- function(x)  x*rep(3.14, 1)

In order to do this an additional attribute has to be added to the fct_deriv list.

The other possibility is to handle this case only for the jacobian function. Meaning that the lines where functions such as numeric are found are kept unchanged. Probably it would be best to allow only one function at the rhs. I mean something like:

jac <- function(x) {
   y <- numeric(2) # allowed
   y <- numeric(2)*x[1] # not allowed
}

I personally prefer the first approach. As the second one is complicated to explain to the user and adds additional restrictions.

What is your opinion?

@mailund
Copy link
Collaborator

mailund commented Dec 8, 2022

I agree that a warning is probably the best solution. We definitely do not want to fail without warnings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants