wbpip_gd_doc.qmd

---
title: "Group Data Documentation (wbpip)"
author: "Diana Garcia"
format: 
  html:
    number-sections: true
    theme: sandstone
    code-fold: true
    code-summary: "Show the code"
    toc: true
    link-external-newwindow: true
---

## Objective {.unnumbered}

This document aims to describe the functions used in `wbpip` to calculated the poverty and inequality statistics using group data.

## Structure {.unnumbered}

```{mermaid}
%%| fig-width: 6.5

flowchart RL
    lz(select_lorenz) --> stats[pip_stats]
    lq(pip_stats_lq) --> lz
    lb(pip_stats_lb) --> lz


    form_lq(functional_form_lq) --> lq
    est_lq(estimate_lq) --> lq
    fit_lq(fit_lq) --> lq
    derive_lq(derive_lq) --> lq
    
    regres(regres) --> lq
    regres(regres) --> lb
    derive_lq(derive_lq) --> lq
    
    regres(regres) --> lq
    regres(regres) --> lb
    
    form_lb(functional_form_lb) --> lb
    est_lb(estimate_lb) --> lb
    fit_lb(fit_lb) --> lb
    derive_lb(derive_lb) ---> lb


    check_val(check_curve_validity_lq) --> est_lq
    comp_dist_lq(compute_dist_stats_lq) --> est_lq
    comp_pov_lq(compute_pov_stats_lq) --> est_lq
    
    check_val_lb(check_curve_validity_lb) --> est_lb
    derive_lb(derive_lb) ---> check_val_lb
    ddlk(ddlk) --> check_val_lb
    
    comp_dist_lb(compute_dist_stats_lb) --> est_lb
    derive_lb --> comp_dist_lb
    value_at_lb --> comp_dist_lb
    gini_lb --> comp_dist_lb
    value_at_lb --> gini_lb
    polarization_lb --> comp_dist_lb
    derive_lb --> polarization_lb
    value_at_lb --> polarization_lb
    quantile_lb --> comp_dist_lb
    gini_lb --> comp_dist_lb
    value_at_lb --> gini_lb
    polarization_lb --> comp_dist_lb
    derive_lb --> polarization_lb
    value_at_lb --> polarization_lb
    quantile_lb --> comp_dist_lb
    
    comp_pov_lb(compute_pov_stats_lb) --> est_lb
    headcount_lb --> comp_pov_lb
    rtSafe --> headcount_lb
    headcount_lb --> comp_pov_lb
    rtSafe --> headcount_lb
    funcD --> rtSafe
    rtNewt --> rtSafe
    BETAI --> headcount_lb
    BETAI --> headcount_lb
    BETAICF --> BETAI
    GAMMLN --> BETAI
    pov_gap_lb --> comp_pov_lb
    pov_severity_lb --> comp_pov_lb
    pov_gap_lb --> comp_pov_lb
    pov_severity_lb --> comp_pov_lb
    
    value_at_lb --> fit_lb
    
    polarization_lq(<b>polarization_lq</b>) --> comp_dist_lq
    gini_lq(<b>gini_lq</b>) --> comp_dist_lq
    value_at_lq --> polarization_lq
    value_at_lq(value_at_lq) --> comp_dist_lq
    mld_lq(<b>mld_lq</b>) --> comp_dist_lq
    quantile_lq(<b>quantile_lq</b>) --> comp_dist_lq
    
    headcount_lq --> comp_pov_lq
    pov_gap_lq --> comp_pov_lq
    pov_severity_lq --> comp_pov_lq
    watts_lq --> comp_pov_lq

    
    headcount_lq --> comp_pov_lq
    pov_gap_lq --> comp_pov_lq
    pov_severity_lq --> comp_pov_lq
    watts_lq --> comp_pov_lq

    derive_lq --> comp_dist_lq
    derive_lq --> mld_lq
    derive_lq --> polarization_lq
    value_at_lq --> quantile_lq
    value_at_lq --> pov_gap_lq
    value_at_lq --> pov_severity_lq
    value_at_lq --> fit_lq
    
    style derive_lq fill: #50C878
    style value_at_lq fill: #50C878
    style check_val fill: #50C878
    style form_lq fill: #50C878
    style regres fill: #50C878
    style gini_lq fill: #FFA500
    style polarization_lq fill: #FFA500
    style quantile_lq fill: #FFA500
    style mld_lq fill: #FFA500
    style comp_dist_lq fill: #FFF1A2
    style comp_pov_lq fill: #FFF1A2 
    style headcount_lq fill: #FFF1A2 
    style pov_gap_lq fill: #FFF1A2
    style pov_severity_lq fill: #FFF1A2
    style watts_lq fill: #FFF1A2
    value_at_lq --> pov_gap_lq
    value_at_lq --> pov_severity_lq
    value_at_lq --> fit_lq
    
    style derive_lq fill: #50C878
    style value_at_lq fill: #50C878
    style check_val fill: #50C878
    style form_lq fill: #50C878
    style regres fill: #50C878
    style gini_lq fill: #FFF1A2
    style polarization_lq fill: #FFF1A2
    style quantile_lq fill: #FFA500
    style mld_lq fill: #FFA500
    style comp_pov_lq fill: #FFF1A2 
    style headcount_lq fill: #FFF1A2 
    style pov_gap_lq fill: #FFF1A2
    style pov_severity_lq fill: #FFF1A2
    style watts_lq fill: #FFF1A2
    
    style form_lb fill: #50C878
    style derive_lb fill: #FFF1A2
    style est_lb fill: #50C878
    style check_val_lb fill: #FFF1A2 
    style comp_pov_lb fill: #FFF1A2
    style ddlk fill: #FFF1A2
    style headcount_lb fill: #FFF1A2
    style BETAI fill: #FFA500
    style BETAICF fill: #50C878
    style GAMMLN fill: #FFA500
    style rtSafe fill: #50C878
    style funcD fill: #50C878
    style rtNewt fill: #50C878

```

## Functions

### pip_stats {#sec-pip_stats}

*Description*: Compute poverty statistics for grouped data by selecting the best functional fit for the Lorenz curve (either beta or quadratic)

```{r}
gd_compute_pip_stats <- function(welfare,
                                 povline,
                                 population,
                                 requested_mean,
                                 popshare = NULL,
                                 default_ppp = 1,
                                 ppp = NULL,
                                 p0 = 0.5) {


  # Apply Lorenz quadratic fit ----------------------------------------------
  results_lq <- gd_compute_pip_stats_lq(
    welfare = welfare,
    population = population,
    requested_mean = requested_mean,
    povline = povline,
    popshare = popshare,
    default_ppp = default_ppp,
    ppp = ppp,
    p0 = p0
  )

  # Apply Lorenz beta fit ----------------------------------------------
  results_lb <- gd_compute_pip_stats_lb(
    welfare = welfare,
    population = population,
    requested_mean = requested_mean,
    povline = povline,
    popshare = popshare,
    default_ppp = default_ppp,
    ppp = ppp,
    p0 = p0
  )


  # Apply selection rules ---------------------------------------------------
  out <- gd_select_lorenz(
    lq = results_lq,
    lb = results_lb
  )

  # Return only subset of variables
  out <- out[c(
    "poverty_line",
    "mean",
    "median",
    "headcount",
    "poverty_gap",
    "poverty_severity",
    "watts",
    "gini",
    "mld",
    "polarization",
    "deciles"
  )]


  return(out)
}
```

::: {#nte-boundary-issue .callout-note}
#### Issue: Boundary conditions

At the moment, this is calculated using $(\mu L'(0.001) + 4,\mu L'(0.98) - 4)$ (*Note: Not sure what is the reason of adding and subtracting 4*).
:::

### select_lorenz {#sec-select_lorenz}

*Description*: Select best Lorenz fit and adjust the returned statistics if needed.

```{r}
gd_select_lorenz <- function(lq, lb) {

  # Set default value
  datamean <- lq[["mean"]]
  is_valid <- lq[["is_valid"]] | lb[["is_valid"]]
  is_normal <- lq[["is_normal"]] | lb[["is_normal"]]

  # Selection of Lorenz fit for poverty statistics
  use_lq_for_pov <- use_lq_for_poverty(
    lq = lq,
    lb = lb
  )

  # Selection of Lorenz fit for distributional statistics
  use_lq_for_dist <- use_lq_for_distributional(
    lq = lq,
    lb = lb
  )

  # Retrieve distributional statistics
  dist <- retrieve_distributional(
    lq = lq,
    lb = lb,
    is_valid = is_valid,
    use_lq_for_dist = use_lq_for_dist
  )

  # Retrieve poverty statistics
  pov <- retrieve_poverty(
    lq = lq,
    lb = lb,
    is_normal = is_normal,
    use_lq_for_pov = use_lq_for_pov
  )

  return(list(
    mean             = datamean,
    poverty_line     = pov[["poverty_line"]],
    z_min            = dist[["z_min"]],
    z_max            = dist[["z_max"]],
    # ppp            = lq[["ppp"]],
    gini             = dist[["gini"]],
    median           = dist[["median"]],
    # rmed           = rmed,
    rmhalf           = dist[["rmhalf"]],
    polarization     = dist[["polarization"]],
    ris              = dist[["ris"]],
    mld              = dist[["mld"]],
    dcm              = lq[["dcm"]],
    deciles          = dist[["deciles"]],
    headcount        = pov[["headcount"]],
    poverty_gap      = pov[["poverty_gap"]],
    poverty_severity = pov[["poverty_severity"]],
    eh               = pov[["eh"]],
    epg              = pov[["epg"]],
    ep               = pov[["ep"]],
    gh               = pov[["gh"]],
    gpg              = pov[["gpg"]],
    gp               = pov[["gp"]],
    watts            = pov[["watts"]],
    sse              = dist[["sse"]]
  ))
}

```

### pip_stats_lq

*Description*: Compute poverty statistics for grouped data using the quadratic functional form of the Lorenz qurve.

```{r}
gd_compute_pip_stats_lq <- function(welfare,
                                    povline,
                                    population,
                                    requested_mean,
                                    popshare = NULL,
                                    default_ppp,
                                    ppp = NULL,
                                    p0 = 0.5) {

  # Adjust mean if different PPP value is provided
  if (!is.null(ppp)) {
    requested_mean <- requested_mean * default_ppp / ppp
  } else {
    ppp <- default_ppp
  }
  # STEP 1: Prep data to fit functional form
  prepped_data <- create_functional_form_lq(
    welfare = welfare,
    population = population
  )

  # STEP 2: Estimate regression coefficients using LQ parameterization
  reg_results <- regres(prepped_data, is_lq = TRUE)
  reg_coef <- reg_results$coef

  A <- reg_coef[1]
  B <- reg_coef[2]
  C <- reg_coef[3]

  # Step 2.1: pre-calculate key values
  kv <- gd_lq_key_values(A, B, C)

  # OPTIONAL: Only when popshare is supplied
  # return poverty line if share of population living in poverty is supplied
  # intead of a poverty line
  if (!is.null(popshare)) {
    povline <- derive_lq(popshare,
                         A, B, C,
                         key_values = kv) * requested_mean
  }

  # Boundary conditions (Why 4?)
  z_min <- requested_mean * derive_lq(0.001,
                                      A, B, C,
                                      key_values = kv) + 4
  z_max <- requested_mean * derive_lq(0.980,
                                      A, B, C,
                                      key_values = kv) - 4
  z_min <- if (z_min < 0) 0L else z_min

  results1 <- list(requested_mean, povline, z_min, z_max, ppp)
  names(results1) <- list("mean", "poverty_line", "z_min", "z_max", "ppp")

  # STEP 3: Estimate poverty measures based on identified parameters
  results2 <- gd_estimate_lq(requested_mean, povline, p0,
                             A, B, C, key_values = kv)

  # STEP 4: Compute measure of regression fit
  results_fit <- gd_compute_fit_lq(welfare,
                                   population,
                                   results2$headcount,
                                   A, B, C,
                                   key_values = kv)

  res <- c(results1,
           results2,
           results_fit,
           reg_results)

  return(res)
}

```

### functional_form_lq

*Description*: Prepares data for regression of $y(1-y)$ on $(x^2-y)$, $y(x-1)$ and $(x-y)$.

```{r}
create_functional_form_lq <- function(welfare,
                                      population) {
  # CHECK inputs
  # assertthat::assert_that(is.numeric(population))
  # assertthat::assert_that(is.numeric(welfare))
  # assertthat::assert_that(length(population) == length(welfare))
  # assertthat::assert_that(length(population) > 1)

  # Remove last observation (the functional form for the Lorenz curve already forces
  # it to pass through the point (1, 1)
  nobs <- length(population) - 1
  population <- population[1:nobs]
  welfare <- welfare[1:nobs]

  # L(1-L)
  y <- welfare * (1 - welfare)
  # (P^2-L)
  x1 <- population^2 - welfare
  # L(P-1)
  x2 <- welfare * (population - 1)
  # P-L
  x3 <- population - welfare

  return(list(y = y, X = cbind(x1, x2, x3)))

}

```

*Note*: The last observation of (x,y), which by construction has the value (1, 1), is excluded since the functional form for the Lorenz curve already forces it to pass through the point (1, 1).

*References*: \[1\]

*Relevant equations*:

The general Quadratic Lorenz curve form is:

$$ax^2 + bxy + cy^2 + dx + ey + f = 0$$ {#eq-gen-equation}

where $y$ is the vector of cumulative proportion of consumption/income (L) and $x$ is the cumulative proportions of population (P). Using the conditions $f=0$ and $e = -(a+b+d+1)$ the previous equation is rewritten in a linear form as follows (Equation 15 in Villasenor et al, 1989):

$$y(1-y) = a(x^2-y) + by(x-1) + d(x-y)$$ {#eq-linear-form}

This function prepares data to estimate $a$, $b$, and $d$ (*Note*: $d$ is named $C$ in `wbpip`).

### value_at_lq

*Description*: Solves for the Quadratic Lorenz curves.

```{r}
value_at_lq <- function(x, A, B, C, key_values) {

  # Check for NA, Inf and negative values in x
  check_NA_Inf_values(x)
  check_neg_values(x)

  # Calculations
  # e <- -(A + B + C + 1)
  # m <- (B^2) - (4 * A)
  # n <- (2 * B * e) - (4 * C)
  temp <- (key_values$m * x^2) + (key_values$n * x) + (key_values$e^2)
  temp[temp < 0] <- 0

  # Solving the equation of the Lorenz curve
  estle <- -0.5 * ((B * x) + key_values$e + sqrt(temp))

  return(estle)
}
```

*References*: \[1\]

*Relevant equations*:

This function calculates the value at the Quadratic Lorenz Curve. Solving @eq-gen-equation for $y$, and assuming $f=0$ and $e = -(a+b+d+1)$ the density function that better fits income distributions will be (Equation 6b in Villasenor et al, 1989):

$$ y= \Bigl\{-(bx+e) - (\alpha x^2 + \beta x + e^2)^\frac{1}{2}\Bigl\}/2$$ {#eq-solve-y}

where $\alpha = b^2 -4a$ and $\beta = 2be - 4d$.

### derive_lq

*Description*: returns the first derivative of the quadratic Lorenz curves with $c = 1$.

```{r}
derive_lq <- function(x, A, B, C, key_values) {

  if (is.null(key_values)) {
    key_values <- gd_lq_key_values(A, B, C)
    # e          <- key_values$e
    # m          <- key_values$m
    # n          <- key_values$n
  }

  if (anyNA(x) == TRUE) {
    cli::cli_abort("`x' must be a numeric or integer vector")
  }
  # note:
  #   alpha --> m
  #   beta  --> n

  # e <- -(A + B + C + 1)
  # m <- (B^2) - (4 * A)
  # n <- (2 * B * e) - (4 * C) # C is called D in original paper, but C in Datt paper
  tmp <- (key_values$m * x^2) + (key_values$n * x) + (key_values$e^2)
  tmp[(!is.na(tmp) & tmp < 0)] <- 0 # If tmp == 0, val = Inf.

  # Formula for first derivative of GQ Lorenz Curve
  val <- -(B / 2) - ((2 * key_values$m * x + key_values$n) / (4 * sqrt(tmp)))

  return(val)
}
```

*References*: \[1\]

*Relevant equations*:

This function computes the first derivative of @eq-solve-y:

$$-(b / 2) - (\beta + 2 \alpha x) / (4\sqrt(\alpha x^2 + \beta x + e^2)$$ {#eq-first-der}

### estimate_lq

*Description*: Estimates poverty and inequality stats from Quadratic Lorenz fit

```{r}
gd_estimate_lq <- function(mean, povline, p0, A, B, C, key_values) {

  if (is.null(key_values)) {
    key_values <- gd_lq_key_values(A, B, C)
  }
  e  <- key_values$e
  m  <- key_values$m
  n  <- key_values$n
  r  <- key_values$r
  s1 <- key_values$s1
  s2 <- key_values$s2

  validity <- check_curve_validity_lq(A, B, C, key_values = key_values)
                                      #e, m, n, r^2)
  if (validity$is_valid == FALSE & validity$is_normal == FALSE) {
    return(empty_gd_compute_pip_stats_response)
  }

  # Compute distributional measures -----------------------------------------
  dist_stats <- gd_compute_dist_stats_lq(mean, p0, A, B, C, key_values = key_values)

  # Compute poverty stats ---------------------------------------------------
  pov_stats  <- gd_compute_poverty_stats_lq(mean, povline, A, B, C, key_values = key_values)

  out <- list(
    gini = dist_stats$gini,
    median = dist_stats$median,
    rmhalf = dist_stats$rmhalf,
    polarization = dist_stats$polarization,
    ris = dist_stats$ris,
    mld = dist_stats$mld,
    dcm = dist_stats$dcm,
    deciles = dist_stats$deciles,
    headcount = pov_stats$headcount,
    poverty_gap = pov_stats$pg,
    poverty_severity = pov_stats$p2,
    eh = pov_stats$eh,
    epg = pov_stats$epg,
    ep = pov_stats$ep,
    gh = pov_stats$gh,
    gpg = pov_stats$gpg,
    gp = pov_stats$gp,
    watts = pov_stats$watts,
    dl = pov_stats$dl,
    ddl = pov_stats$ddl,
    is_normal = validity$is_normal,
    is_valid = validity$is_valid
  )

  return(out)
}

```

### check_curve_validity_lq

*Description*: Check validity of Lorenz Quadratic fit

```{r}
check_curve_validity_lq <- function(A, B, C, key_values) {
  is_normal <- FALSE
  is_valid <- FALSE
  r <- (key_values$r)^2 # formerly, the input to the func was r^2

  # r needs to be > 0 because need to extract sq root
  if (r < 0) { # now that r is squared, this will never be TRUE
    return(list( # but r^2 was used as input before `key_values` so
      is_normal = is_normal, # this was already never executed
      is_valid = is_valid
    ))
  }

  if (key_values$e > 0 || C < 0) {
    return(list(
      is_normal = is_normal,
      is_valid = is_valid
    ))
  }

  # Failure conditions for checking theoretically valid Lorenz curve
  # Found in section 4 of Datt computational tools paper
  cn1 <- key_values$n^2
  cn3 <- cn1 / (4 * key_values$e^2)

  if (!((key_values$m < 0) |
    ((key_values$m > 0) & (key_values$m < cn3) & (key_values$n >= 0)) |
    ((key_values$m > 0) & (key_values$m < -key_values$n / 2) & (key_values$m < cn3)))) {
    return(list(
      is_normal = is_normal,
      is_valid = is_valid
    ))
  }

  is_normal <- TRUE
  is_valid <- (A + C) >= 0.9

  return(list(
    is_normal = is_normal,
    is_valid  = is_valid
  ))
}

```

*References*: \[1\], \[2\], \[3\]

*Relevant equations*:

The function tests for specific assumptions for the Lorenz Quadratic and relies on the formulas from Table 2 in Datt, G (1998) \[3\]. The nomenclature for some of these formulas differ from Villasenor et al (1989), so this is how they match:

$m = \alpha = b^2 -4a$

$n = \beta = 2be -4c$ ($c=d$ for Villasenor et al, 1989)

$r = K*2\alpha = (n^2 - 4me^2)^\frac{1}{2}$

::: {#nte-cond-r-issue .callout-note}
#### Issue: Rename $r$

At the moment, $r$ refers to $(n^2 - 4me^2)$ in `wbpip`. This is already fixed on commit `47852c` in branch `fix_key_values` by Zander (Waiting for merge). We should maybe rename this within this function.
:::

The conditions this function tests are presented in Section 4 of Datt (1998):

Normality and Validity

-   $(n^2 - 4me^2)>0$ so the square root in $r$ to be positive. (Normality and validity)
-   $e<0$ or $c>0$ so $L(0,y) = 0$ and $L'(0^{+},y) \geq 0$ (Normality and validity)
-   $a+d \geq 0.9$ so $L(1,y) = 1$ (Validity)

::: {#imp-condad .callout-important}
#### Issue: The inequality for $a+d$

At the moment, most test are design for the old validation $a+d \leq 1$. However, the Corrigendum of the original paper \[2\] indicates we should use $a+d \geq 1$
:::

And so $L''(x,y) \geq 0$ for $x$ within $(0,1)$:

-   $m < 0$ (condition on Villasenor et al., 1989)
-   if $m > 0$ then $m < n^2/4 e^2$ and $n \leq 0$ (last condition from Datt,1998)
-   if $m > 0$ then $m < n^2/4 e^2$ and $m < -n/2$ (last condition from Datt,1998)

$L''(x,y)$ can be calculated using @eq-first-der:

$$ \frac{\beta^2 -4\alpha e^2}{8(\alpha x^2 + \beta x + e^2)^\frac{3}{2}} = \frac{n^2 - 4me^2}{8(mx^2 +nx +e^2)^\frac{3}{2}}=\frac{r^2}{8(mx^2 +nx +e^2)^\frac{3}{2}} $$

### compute_dist_stats_lq

*Description*: Computes distributional stats from Lorenz Quadratic fit

```{r}
gd_compute_dist_stats_lq <- function(mean, p0, A, B, C, key_values = key_values) {

  gini    <- gd_compute_gini_lq(A, B, C,
                                key_values = key_values)
  median  <- mean * derive_lq(0.5, A, B, C,
                              key_values = key_values)
  rmhalf  <- value_at_lq(p0, A, B, C,
                         key_values = key_values) * mean / p0 # What is this??
  dcm     <- (1 - gini) * mean
  pol     <- gd_compute_polarization_lq(mean, p0, dcm, A, B, C,
                                        key_values = key_values)
  ris     <- value_at_lq(0.5, A, B, C,
                         key_values = key_values)
  mld     <- gd_compute_mld_lq(A, B, C,
                               key_values = key_values)
  deciles <- gd_compute_quantile_lq(A, B, C,
                                    key_values = key_values)

  return(list(
    gini         = gini,
    median       = median,
    rmhalf       = rmhalf,
    dcm          = dcm,
    polarization = pol,
    ris          = ris,
    mld          = mld,
    deciles      = deciles
  ))
}
```

::: {#nte-check1 .callout-note}
#### Check:

What is the difference with `gd_estimate_dist_stats_lq`?
:::

### quantile_lq

*Description*: Compute quantiles from Lorenz Quandratic fit

```{r}
old_gd_compute_quantile_lq <- function(A, B, C, n_quantile = 10) {
  vec <- vector(mode = "numeric", length = n_quantile)
  x1 <- 1 / n_quantile
  q <- 0L
  lastq <- 0L
  for (i in seq_len(n_quantile - 1)) {
    q <- value_at_lq(x1, A, B, C)
    v <- q - lastq
    vec[i] <- v
    lastq <- q
    x1 <- x1 + 1 / n_quantile
  }
  vec[n_quantile] <- 1 - lastq

  return(vec)
}
```

*Note*: This function calculates the quantiles for a Lorenz Quadratic with specific values for $a$, $b$ and $d$.

::: {#imp-quant-gd-issue .callout-important}
#### Issues:

The description of this function indicates that it calculates the quantiles (deciles) for the density for some specific values of $a$, $b$ and $d$, but it calculates instead the "share" or the value between the deciles. If I understand correctly, the density is the lorenz curve and the share will refer to the `deciles` of the welfare vector.

Regarding the last decile, they manually calculated it by subtracting 1 to the second-to-last decile:

`vec[n_quantile] <- 1 - value_at_lq(x[n_quantile-1], A, B, C)` where `n_quantile = 10`

However, my hypothesis is that this is related to *Issue: The inequality for* $a+d$ above. If $A+C \geq 1$ then `value_at_lq(1, A, B, C) = 1`. We can see the case in this [Desmos graph](https://www.desmos.com/calculator/g164gdepzk){.external target="_blank"}.
:::

*My version of the code*:

```{r}
gd_compute_quantile_lq <- function(A, B, C, n_quantile = 10) {

  x   <- seq(from = 1/n_quantile, to = 1, by = 1/n_quantile)

  vec <- diff(c(0,value_at_lq(x, A, B, C)))

  vec[n_quantile] <- 1- value_at_lq(x[n_quantile-1], A, B, C) # Is this correct?

  return(vec)
}

```

### mld_lq

*Description*: Computes Mean Log Deviation from Lorenz Quadratic fit

```{r}
old_gd_compute_mld_lq <- function(A, B, C) {
  x1 <- derive_lq(0.0005, A, B, C)
  gap <- 0L
  mld <- 0L
  if (x1 == 0) {
    gap <- 0.0005
  } else {
    mld <- suppressWarnings(log(x1) * 0.001)
  }
  x1 <- derive_lq(0, A, B, C)
  for (xstep in seq(0, 0.998, 0.001)) {
    x2 <- derive_lq(xstep + 0.001, A, B, C)
    if ((x1 <= 0) || (x2 <= 0)) {
      gap <- gap + 0.001
      if (gap > 0.5) {
        return(-1)
      }
    } else {
      gap <- 0L
      mld <- mld + (log(x1) + log(x2)) * 0.0005
    }
    x1 <- x2
  }
  return(-mld)
}
```

*References*: None

*Note*: In this function, they do not describe how the function is calculated. The following is my hypothesis of what they were trying to accomplish:

*Relevant equations*:

The mean log deviation:

$$-\frac{1}{N} \sum_{i=1}^N ln(\frac{y_i}{\mu})=-\frac{1}{N} \sum_{i=1}^N ln(\frac{y_i}{\frac{1}{N}\sum_{i=1}^N y_i}) $$

Using the derivation from Rohde (2008) (Equation 15.20), we know that:

$$L'(\pi) = \frac{N y_k}{\sum_{k=1}^N y_k}$$

where $y_k$ is the income accruing to the $k_{th}$ individual if ordered such that $y_1<y_2<...<y_k$ and $\pi= \frac{k}{j}$. Then the mean log deviation can be rewritten as (Equation 15.21):

$$-\int_0^1 ln(L'(\pi))d\pi= \lim_{n \rightarrow \infty} -\sum_{i=1}^{N}\frac{1}{N}ln(\frac{N y_k}{\sum_{k=1}^N y_k})$$

::: {#imp-mld-gd-issue .callout-important}
#### Issue:

My hypothesis is that they used the last formula to calculate the Mean Log Deviation. I am still unsure why they used some rules at the lower end, more specifically why they return $1$ if for the left tail of $y$ we encounter negative values.
:::

*My version of the code*:

```{r}
gd_compute_mld_lq <- function(A, B, C) {
  x1 <- derive_lq(0.0005, A, B, C) 
  mld <- 0L
  if (x1 != 0) { 
    mld <- suppressWarnings(log(x1) * 0.001) # Needed to match test
  }

  xstep <- seq(0, 0.999, 0.001)
  x <- derive_lq(xstep, A, B, C)

  if (any(x[1:33]<=0)){ # To account for negative values 
    return(-1)
  }else{
    mld <- mld + fsum( (log(x[1:999])+log(x[2:1000])) *0.0005) 
    return(-mld)
  }
}
```

### gini_lq

*Description*: Compute Gini index from Lorenz Quadratic fit.

```{r}
gd_compute_gini_lq <- function(A, B, C, key_values) {

  # For the GQ Lorenz curve, the Gini formula are valid under the condition A+C>=1
  # P.isValid <- (A + C) >= 0.9
  # P.isNormal <- TRUE

  e1 <- abs(A + C - 1)
  e2 <- 1 + (B / 2) + key_values$e

  tmp1 <- key_values$n * (B + 2) / (4 * key_values$m)
  tmp2 <- (key_values$r^2) / (8 * key_values$m)
  tmp3 <- (2 * key_values$m) + key_values$n

  if (key_values$m > 0) {
    # tmpnum <- tmp3 + 2 * sqrt(m) * abs(e)
    # tmpden <- n - 2 * abs(e) * sqrt(m)

    # Formula from Datt paper
    # CHECK that code matches formulas in paper
    gini <- e2 + (tmp3 / (4 * key_values$m)) * e1 - (key_values$n * abs(key_values$e) / (4 * key_values$m)) - ((key_values$r^2) / (8 * sqrt(key_values$m)^3)) *
      log(abs(((tmp3 + (2 * sqrt(key_values$m) * e1))) / (key_values$n + (2 * sqrt(key_values$m) * abs(key_values$e)))))
    # P.gi <- (e/2) - tmp1 - (tmp2 * log(abs(tmpnum/tmpden)) / sqrt(m))
  } else {
    tmp4 <- ((2 * key_values$m) + key_values$n) / key_values$r
    tmp4 <- if (tmp4 < -1) -1 else tmp4
    tmp4 <- if (tmp4 > 1) 1 else tmp4

    # Formula does not match with paper
    gini <- e2 +
      (tmp3 / (4 * key_values$m)) *
      e1 - (key_values$n * abs(key_values$e) / (4 * key_values$m)) +
      (tmp2 * (asin(tmp4) - asin(key_values$n / key_values$r)) / sqrt(-key_values$m))
    # P.gi <- (e/2) - tmp1 + ((tmp2 * (asin(tmp4) - asin(n/r))) / sqrt(-m))
  }

  return(gini)
}
```

*References*: \[3\]

*Relevant equations*:

-   if $m<0$:

$$ \frac{e}{2} - \frac{n (b+2)}{4m} + \frac{r^2}{8m\sqrt{-m}}\left[ \sin^{-1} \frac{2 + n}{r} - \sin^{-1} \frac{2 + n}{r} \right]$$

-   if $m>0$:

$$  \frac{e}{2} - \frac{n (b+2)}{4m} - \frac{r^2}{8m\sqrt{m}} \ln \left[\left|{\frac{2m+n+2\sqrt{m} (a+c-1)}{n-2e\sqrt{m}}}\right| \right] $$ 

::: {#imp-gini-gd-issue .callout-important}
#### Issue:

The formula does not match the Datt paper Gini formula as mentioned previously by Tony.

:::

### polarization_lq

*Description*: Computes polarization index from parametric Lorenz fit

```{r}
gd_compute_polarization_lq <- function(mean,
                                       p0,
                                       dcm,
                                       A, B, C,
                                       key_values) {
  pol <- 2 - (1 / p0) +
    (dcm - (2 * value_at_lq(p0, A, B, C, key_values) * mean)) /
      (p0 * mean * derive_lq(p0, A, B, C, key_values))

  return(pol)
}

```

*References*: None

*Relevant equations*: None

*Questions*: What is p0?

### pip_stats_lb

*Description*: Compute poverty statistics for grouped data using the beta functional form of the Lorenz curve.

```{r}
gd_compute_pip_stats_lb <- function(welfare,
                                    povline,
                                    population,
                                    requested_mean,
                                    popshare = NULL,
                                    default_ppp,
                                    ppp = NULL,
                                    p0 = 0.5) {

  # Adjust mean if different PPP value is provided
  if (!is.null(ppp)) {
    requested_mean <- requested_mean * default_ppp / ppp
  } else {
    ppp <- default_ppp
  }
  # STEP 1: Prep data to fit functional form
  prepped_data <- create_functional_form_lb(
    welfare = welfare,
    population = population
  )

  # STEP 2: Estimate regression coefficients using LB parameterization
  reg_results <- regres(prepped_data, is_lq = FALSE)
  reg_coef <- reg_results$coef

  A <- reg_coef[1]
  B <- reg_coef[2]
  C <- reg_coef[3]

  # OPTIONAL: Only when popshare is supplied
  # return poverty line if share of population living in poverty is supplied
  # instead of a poverty line

  if (!is.null(popshare)) {
    povline <- derive_lb(popshare, A, B, C) * requested_mean
  }

  # Boundary conditions (Why 4?)
  z_min <- requested_mean * derive_lb(0.001, A, B, C) + 4
  z_max <- requested_mean * derive_lb(0.980, A, B, C) - 4
  z_min <- if (z_min < 0) 0L else z_min

  results1 <- list(requested_mean, povline, z_min, z_max, ppp)
  names(results1) <- list("mean", "poverty_line", "z_min", "z_max", "ppp")

  # STEP 3: Estimate poverty measures based on identified parameters
  results2 <- gd_estimate_lb(requested_mean, povline, p0, A, B, C)

  # STEP 4: Compute measure of regression fit
  results_fit <- gd_compute_fit_lb(welfare, population, results2$headcount, A, B, C)

  res <- c(results1, results2, results_fit, reg_results)

  return(res)
}
```

### functional_form_lb

*Description*: Prepare data for Lorenz beta regression: $Log(L(p) - p) = \log(a) + \alpha \log(p) + \beta \log(1 - p)$.

```{r}
create_functional_form_lb <- function(welfare, population) {
  # CHECK inputs
  # assertthat::assert_that(is.numeric(population))
  # assertthat::assert_that(is.numeric(welfare))
  # assertthat::assert_that(length(population) == length(welfare))
  # assertthat::assert_that(length(population) > 1)

  # Remove last observation (the functional form for the Lorenz curve already forces
  # it to pass through the point (1, 1)
  nobs <- length(population) - 1
  population <- population[1:nobs]
  welfare <- welfare[1:nobs]

  # y
  y <- log(population - welfare)
  # x1
  x1 <- 1L
  # x2
  x2 <- log(population)
  # x3
  x3 <- log(1 - population)

  return(list(y = y, X = cbind(x1, x2, x3)))

}
```

*References*: \[5\], \[3\]

*Note*: The last observation of (p,l), which by construction has the value (1, 1), is excluded since the functional form for the Lorenz curve already forces it to pass through the point (1, 1).

*Relevant equations*:

The general Beta Lorenz curve (Kakwani, 1980) form is: $$ L(p) = p - a p^\alpha (1-p)^\beta$$ {#eq-beta-gen-equation}

where $L(p)$ is the vector of cumulative proportion of consumption/income and $p$ is the cumulative proportions of population. They correspond to $y$ and $x$ in the Quadratic Lorenz form.

Note that in Datt (1998), the parameters have different letters: $$ L(p) = p - \theta p^\gamma (1-p)^\delta$$ {#eq-beta-linear-form}

In our code, the parameters are addressed respectively as:

```{=tex}
\begin{aligned}
A = a = \theta \\

B = \alpha = \gamma \\

C = \beta = \delta
\end{aligned}
```
The previous equation is logged and rewritten as follows:

```{=tex}
\begin{aligned}
L(p) = p - a p^\alpha (1-p)^\beta \\

L(p) - p = - a p^\alpha (1-p)^\beta \\

p - L(p) = a p^\alpha (1-p)^\beta \\

Log(p - L(p)) = 

\log(L(p) - p) = \log(a) + \alpha \log(p) + \beta \log(1 - p)

\end{aligned}
```
### value_at_lb

*Description*: Solves for the Beta Lorenz curve.

```{r}

value_at_lb <- function(x, A, B, C) {

  # Check for NA, Inf and negative values in x
  check_NA_Inf_values(x)
  check_neg_values(x)

  out <- x - (A * (x^B) * ((1 - x)^C))

  return(out)
}
```

*References*: \[5\]

This function calculates the value at the Beta Lorenz Curve. It solves @eq-beta-gen-equation for $L(p)$.

### derive_lb

*Description*: returns the first derivative of the Beta Lorenz curves.

```{r}
derive_lb <- function(x, A, B, C) {
  val <- vector("numeric", length(x))
  val[x == 0] <- -Inf
  val[x == 1] <- Inf

  if (B == 1) {
      val[x == 0] <- 1 - A
    }
    if (B > 1) {
      val[x == 0] <- 1
    }
    if (C == 1) {
      val[x == 1] <- 1 + A
    }
    if (C > 1) {
      val[x == 1] <- 1
    } else {

      # Formula for first derivative of GQ Lorenz Curve
      new_x <- x[!(x %in% c(0,1))]
      val[!(x %in% c(0,1))] <- 1 - ((A * new_x^B) * ((1 - new_x)^C) * ((B / new_x) -( C / (1 - new_x)) ) )
    }
  return(val)
}


```

*References*: \[3\]

*Relevant equations*:

As noted above, in our code, the parameters are addressed respectively as:

```{=tex}
\begin{aligned}
A = a = \theta \\

B = \alpha = \gamma \\

C = \beta = \delta
\end{aligned}
```
Additionally, $p = x$ and $L(p) = y$. We will use the code notation for practicality.

This function computes the first derivative of @eq-beta-gen-equation, which is derived as follows:

$$\begin{aligned}
y &= x - A x^B (1-x)^C \\
y' &= a A (1-x)^{C-1} x^B - AB(1-x)^C x^{B-1} + 1\\
 &= 1 - A (1-x)^C x^B \left[\frac{C}{1-x} - \frac{B}{x}\right]
\end{aligned}$$

There are a series of boundary conditions for the first derivative. The conditions are the following:

-   at $x = 0$:
    -   if $B = 1$ -\> $1 - A$
    -   if $B > 1$ -\> $1$
-   at $x = 1$:
    -   if $C = 1$ -\> $1 + A$
    -   if $C > 1$ -\> $1$

According to Kakwani (1980) and the rest of the literature, the only condition for $L'(x) = y'$ is to be $>=0$ at $x = 0$. To be finished.

### estimate_lb

*Description*: Estimates poverty and inequality stats from the Beta Lorenz fit.

```{r}
gd_estimate_lb <- function(mean, povline, p0, A, B, C) {

  # Compute distributional measures
  dist_stats <- gd_compute_dist_stats_lb(mean, p0, A, B, C)

  # Compute poverty stats
  pov_stats <- gd_compute_poverty_stats_lb(mean, povline, A, B, C)

  # Check validity
  validity <- check_curve_validity_lb(headcount = pov_stats[["headcount"]], A, B, C)

  out <- list(
    gini = dist_stats$gini,
    median = dist_stats$median,
    rmhalf = dist_stats$rmhalf,
    polarization = dist_stats$polarization,
    ris = dist_stats$ris,
    mld = dist_stats$mld,
    dcm = dist_stats$dcm,
    deciles = dist_stats$deciles,
    headcount = pov_stats$headcount,
    poverty_gap = pov_stats$pg,
    poverty_severity = pov_stats$p2,
    eh = pov_stats$eh,
    epg = pov_stats$epg,
    ep = pov_stats$ep,
    gh = pov_stats$gh,
    gpg = pov_stats$gpg,
    gp = pov_stats$gp,
    watts = pov_stats$watts,
    dl = pov_stats$dl,
    ddl = pov_stats$ddl,
    is_normal = validity$is_normal,
    is_valid = validity$is_valid
  )

  return(out)
}

```

### check_curve_validity_lb

*Description*: Check validity of Lorenz Beta fit:

```{r}
check_curve_validity_lb <- function(headcount, A, B, C) {
  is_valid <- TRUE

  for (w in seq(from = 0.001, to = 0.1, by = 0.05)) {
    if (derive_lb(w, A, B, C) < 0) {
      is_valid <- FALSE
      break
    }
  }

  if (is_valid) {
    for (w in seq(from = 0.001, to = 0.999, by = 0.05)) {
      if (DDLK(w, A, B, C) < 0) { # What does DDLK stands for?? What does it do?
        is_valid <- FALSE
        break
      }
    }
  }

  # WHAT IS THE RATIONAL HERE?
  is_normal <- if (!is.na(headcount)) {
    is_normal <- TRUE
  } else {
    is_normal <- FALSE
  }

  return(list(
    is_valid = is_valid,
    is_normal = is_normal
  ))
}

```

*References*: \[1\], \[3\]

The function tests for specific assumptions for the Lorenz Beta and relies on the formulas from Table 2 in Datt, G (1998) \[3\].

### DDLK

*Description*

### compute_poverty_stats_lb

*Description*

```{r}
gd_compute_poverty_stats_lb <- function(mean,
                                        povline,
                                        A,
                                        B,
                                        C) {
  # Compute headcount
  headcount <- gd_compute_headcount_lb(
    mean = mean,
    povline = povline,
    A = A,
    B = B,
    C = C
  )

  # Poverty gap
  u <- mean / povline
  pov_gap <- gd_compute_pov_gap_lb(headcount = headcount,
                                   A         = A,
                                   B         = B,
                                   C         = C,
                                   u         = u)

  # Poverty severity
  pov_gap_sq <- gd_compute_pov_severity_lb(
    headcount = headcount,
    pov_gap   = pov_gap,
    A         = A,
    B         = B,
    C         = C,
    u         = u
  )

  # First derivative of the Lorenz curve
  dl <- 1 - A * (headcount^B) * ((1 - headcount)^C) * (B / headcount - C / (1 - headcount))

  # Second derivative of the Lorenz curve
  ddl <- A * (headcount^B) *
    ((1 - headcount)^C) *
    ((B * (1 - B) / headcount^2) +
      (2 * B * C / (headcount * (1 - headcount))) +
      (C * (1 - C) / ((1 - headcount)^2)))

  # Elasticity of headcount index w.r.t mean
  eh <- -povline / (mean * headcount * ddl)

  # Elasticity of poverty gap index w.r.t mean
  epg <- 1 - (headcount / pov_gap)

  # Elasticity of distributionally sensitive FGT poverty measure w.r.t mean
  ep <- 2 * (1 - pov_gap / pov_gap_sq)

  # PElasticity of headcount index w.r.t gini index
  gh <- (1 - povline / mean) / (headcount * ddl)

  # Elasticity of poverty gap index w.r.t gini index
  gpg <- 1 + (((mean / povline) - 1) * headcount / pov_gap)

  # Elasticity of distributionally sensitive FGT poverty measure w.r.t gini index
  gp <- 2 * (1 + (((mean / povline) - 1) * pov_gap / pov_gap_sq))

  # Watts index
  watts <- gd_compute_watts_lb(headcount, mean, povline, 0.005, A, B, C)

  return(
    list(
      headcount = headcount,
      pg = pov_gap,
      p2 = pov_gap_sq,
      eh = eh,
      epg = epg,
      ep = ep,
      gh = gh,
      gpg = gpg,
      gp = gp,
      watts = watts,
      dl = dl,
      ddl = ddl
    )
  )
}
```

### compute_headcount_lb

**Description**: This function calculates the poverty headcount $H$ using the Lorenz Beta. It also checks whether `BETAI` evaluates to NA when run with the same parameters.

```{r}
gd_compute_headcount_lb <- function(mean, povline, A, B, C) {
  # Compute headcount
  
  # First, the function uses rtSafe to bracket the root and solve for H:
  headcount <- rtSafe(0.0001, 0.9999, 1e-4,
    mean = mean,
    povline = povline,
    A = A,
    B = B,
    C = C
  )
  
  # Then, it checks headcount invalidity conditions:
  if (headcount < 0 | is.na(headcount)) {
    return(NA_real_)
  }
  
  # It also checks whether the BETAI for these parameters is NA, if so it would prevent us from using it in the next calculations. 
  condition1 <- is.na(BETAI(
    a = 2 * B - 1,
    b = 2 * C + 1,
    x = headcount
  ))
  
  condition2 <- is.na(BETAI(
    a = 2 * B,
    b = 2 * C,
    x = headcount
  ))
  
  condition3 <- is.na(BETAI(
    a = 2 * B + 1,
    b = 2 * C - 1,
    x = headcount
  ))
  
  # If any of the conditions is NA, it returns NA altogether.
  if (condition1 | condition2 | condition3) {
    return(NA_real_)
  }

  return(headcount)
}

```

*References*:

*Relevant equations*:

### BETAI

**Description**: `BETAI` calculates the incomplete Beta function using the continuous fraction implementation (`BETAICF`), and the gamma function (`GAMMALN`) for convergence.

```{r}
BETAI <- function(a, b, x) {
  if (!is.na(x)) {
    bt <- betai <- 0

    if (x == 0 || x == 1) {
      bt <- 0
    } else {
      bt <- exp((a * log(x)) + (b * log(1 - x)))
    }

    if (x < (a + 1) / (a + b + 2)) {
      betai <- bt * BETAICF(a, b, x) / a
    } else if (is.na(GAMMLN(a)) || is.na(GAMMLN(b)) || is.na(GAMMLN(a + b))) {
      betai <- NA_real_
    } else {
      # I think this is wrong, it should be 1 - I_(1-x)(b,a), this is just I_(1-x)(b,a).
      betai <- exp(GAMMLN(a) + GAMMLN(b) - GAMMLN(a + b)) - (bt * BETAICF(b, a, 1 - x) / b)
    }
  } else {
    betai <- NA_real_
  }

  return(betai)
}

```

*References*:\[6\]

*Relevant equations*:

The incomplete beta function is defined by $$I_x(a, b) = \frac{B_x(a, b)}{B(a, b)} = \frac{1}{B(a, b)} \int_0^x t^{a-1}(1-t)^{b-1} \, dt \quad (a, b > 0)$$ It has the limiting values: $$I_0(a, b) = 0 \quad I_1(a, b) = 1$$ and the symmetry relation: $$I_x(a, b) = 1 - I_{1-x}(b, a)$$

If $a$ and $b$ are both way above one, then $I_0(a, b)$ rises from near-zero to near-unity sharply at about $x = a/(a+b)$. The continued fraction representation of this function is better suitable than the expansion for numerical evaluation (see equation in the `BETAICF` documentation below).

`BETAICF` converges rapidly for $x < (a + 1)/(a + b + 2)$, but for $x < (a + 1)/(a + b + 2)$ we can use the symmetry relation to obtain a form which will also converge quickly (the symmetric version, $1 - I_{1-x}(b, a)$)

::: {#imp-betai-issue .callout-important}
#### Issue:

Following the reasoning of \[6\] and the code provided (written in C), I believe that our current code does not calculate the correct incomplete beta. Here are the step to calculate the Beta correctly:

```{=tex}
\begin{align*}
I_x(a, b) &= \frac{x^a(1 - x)^b}{aB(a, b)} \left[ \frac{1}{1 + \frac{d_1}{1 + \frac{d_2}{1 + \cdots}}} \right] \\
&= \frac{x^a(1 - x)^b}{a \color{blue}{B(a, b)}} \times BETAICF(.)
\end{align*}
```
In our code, the second part of this multiplication is calculated using `BETAICF` function. Now, In order to calculate $B(a,b)$ (blue above) we need to use the relationship between the Beta and the gamma function: $$B(a,b) = \frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)}$$ However, as explained in `GAMMALN`, it is faster and more reliable to calculate $log\Gamma(a,b)$, which is what our `GAMMALN` function does. Therefore, we need to calculate:

```{=tex}
\begin{align*}
I_x(a, b) &= \frac{x^a(1 - x)^b}{aB(a, b)} \left[ \frac{1}{1 + \frac{d_1}{1 + \frac{d_2}{1 + \cdots}}} \right] \\
&= \frac{\color{green}{x^a(1 - x)^b}}{a \color{blue}{B(a, b)}} \times BETAICF(.) \\
&= exp(\color{blue}{GAMMALN(a+b) - GAMMALN(a) - GAMMALN(b)} \\
& \quad + \color{green}{a \cdot log(x) + b \cdot log(1-x)}) \times BETAICF(a, b, x) \cdot 1/a
\end{align*}
```
However, when $x > (a + 1) / (a + b + 2)$, we need to use the symmetry property and calculate instead:

```{=tex}
\begin{align*}
1 - I_{1-x}(b,a) &= 1 - exp(\color{blue}{GAMMALN(a+b) - GAMMALN(a) - GAMMALN(b)} \\
& \quad + \color{green}{a \cdot log(x) + b \cdot log(1-x)}) \times BETAICF(b, a, 1-x) \cdot 1/b
\end{align*}
```
Therefore, the correct implementation of the function should be something like:

```{r}
BETAI <- function(a, b, x) {
  if (!is.na(x)) {
    bt <- betai <- 0

    if (x == 0 || x == 1) {
      bt <- 0
    } else {
      bt <- exp(GAMMLN(a+b) - GAMMLN(a) - GAMMLN(b) + a*log(x) + b*log(1-x)) # the term bt has already the gammaln() and it is already exp() in this version
    }

    if (x < (a + 1) / (a + b + 2)) {
      betai <- bt * BETAICF(a, b, x) / a # here we keep the I_x(a,b)
    } else if (is.na(GAMMLN(a)) || is.na(GAMMLN(b)) || is.na(GAMMLN(a + b))) {
      betai <- NA_real_
    } else {
      betai <- 1 - bt * BETAICF(b, a, 1 - x) / b #and here we use the inverse 1-I_(1-x)(b,a)
    }
  } else {
    betai <- NA_real_
  }

  return(betai)
}


```
:::

### BETAICF

**Description** Used by `BETAI`. This function evaluates the continued fraction implementation part for `BETAI`.

```{r}
BETAICF <- function(a, b, x) {
  eps <- 3e-7
  am <- 1
  bm <- 1
  az <- 1
  qab <- a + b
  qap <- a + 1
  qam <- a - 1
  bz <- c( 1 - (qab * x / qap), rep(1, 99))

  m <- 1:100
  em <- 1:100
  tem <- em * 2
  d <- em * (b - m) * x / ((qam + tem) * (a + tem))
  d2 <- -(a + em) * (qab + em) * x / ((a + tem) * (qap + tem))

  for (i in seq_len(100)) {
    ap <- az + (d[i] * am)
    bp <- bz[i] + (d[i] * bm)
    app <- ap + (d2[i]  * az)
    bpp <- bp + (d2[i] * bz[i])
    aold <- az
    am <- ap / bpp
    bm <- bp / bpp
    az <- app / bpp
    if ((abs(az - aold)) < (eps * abs(az))) {
      break
    }
  }
  return(az)
}
```

*References*: \[6\]

*Relevant equations*: The incomplete beta function in continuous fraction representation is defined as: $$I_x(a, b) = \frac{x^a(1 - x)^b}{aB(a, b)} \color{blue}\left[ \frac{1}{1 + \frac{d_1}{1 + \frac{d_2}{1 + \cdots}}} \right]$$ where $$d_{2m+1} = -\frac{(a + m)(a + b + m)x}{(a + 2m)(a + 2m + 1)}$$ and $$d_{2m} = \frac{m(b - m)x}{(a + 2m - 1)(a + 2m)}$$ This function calculates the second part, highlighted in blue in the formula above.

### GAMMLN

*Description*: `GAMMLN` calculates the logarithm of the gamma function, $log(\Gamma(.))$. It is used as part of the `BETAI` function.

```{r}
GAMMLN <- function(xx) {
  cof <- c(76.18009173, 
           -86.50532033, 
           24.01409822, 
           -1.231739516, 
           0.120858003e-2, 
           -0.536382e-5)
  
  stp <- 2.50662827465
  fpf <- 5.5
  x <- xx - 1
  tmp <- x + fpf
  
  if (tmp <= 0) {
    return(NA_real_)
  }

  tmp <- (x + 0.5) * log(tmp) - tmp
  # ser <- 1L
  x <-  c(x + 1:6)
  ser <- sum(cof / x) + 1

  if (stp * ser <= 0) {
    return(NA_real_)
  }

  return(tmp + log(stp * ser))
}

```

Annotated version:

```{r}
GAMMLN <- function(xx) {
  # Coefficients for the Lanczos approximation; these are pre-computed constants
  cof <- c(76.18009173, 
           -86.50532033, 
           24.01409822, 
           -1.231739516, 
           0.120858003e-2, 
           -0.536382e-5)
  
  # sqrt(2*pi), which is a part of the normalization constant in the Lanczos formula
  stp <- 2.50662827465
  
  # Constant to offset x for the approximation; gamma = 5 -> equal to gamma + 1/2
  fpf <- 5.5
  
  # Adjust the input xx by subtracting 1, given that we calculate it for z + 1
  x <- xx - 1
  
  # Calculate the term (z + gamma + 1/2)
  tmp <- x + fpf
  
  # Check if the adjusted term is non-positive
  if (tmp <= 0) {
    return(NA_real_)
  }

  # Calculate part of the Lanczos formula:
  # This corresponds to the (z + gamma + 1/2)^(z + 1/2) * e^(-(z + gamma + 1/2)) in the approximation
  # e is the base of the natural logarithm; since we are calculating the log-gamma,
  # we are taking the log of e^(-(z+gamma+1/2)), which simplifies to just -(z+gamma+1/2)
  tmp <- (x + 0.5) * log(tmp) - tmp
  
  # Create a sequence of (z+1) to (z+6) for the Lanczos series in the denominator
  x <-  c(x + 1:6)
  
  # Calculate the Lanczos series sum using the coefficients (cof)
  # and add 1 to it for the c_0 term in the Lanczos formula
  ser <- sum(cof / x) + 1

  # Check if the final series is non-positive
  if (stp * ser <= 0) {
    return(NA_real_)
  }

  # Return the final computed value of the logarithm of the gamma function
  # This combines the calculated series with the earlier part and the log of the normalization constant (stp)
  return(tmp + log(stp * ser))
}

```

*References*:\[6\]

*Relevant equations*: The gamma function is defined by the integral: $$\Gamma(z) = \int_0^{\infty} t^{z-1} e^{-t} \, dt$$

To calculate it numerically, we use an approximation. The best approximation we have is the Lanczos approximation\[6\], which we use here:

$$\begin{align*}
\Gamma(z + 1) &\approx (z + \gamma + \frac{1}{2})^{z+\frac{1}{2}} e^{-(z+\gamma+\frac{1}{2})} \\
& \quad \times \sqrt{2\pi} \left[ c_0 + \frac{c_1}{z + 1} + \frac{c_2}{z + 2} + \ldots + \frac{c_N}{z + N} + \epsilon \right] \\
& \\
& with \quad (z > 0)
\end{align*}$$

With parameters $\gamma = 5$ and $N = 6$.

As noted in \[6\], it is easier to compute the logarithm of the gamma function, which is what this function does. Therefore, the actual equation we use is this:

$$\begin{align*}
\ln(\Gamma(z + 1)) & \approx \left( z + \frac{1}{2} \right) \ln\left( z + \gamma + \frac{1}{2} \right) - \left( z + \gamma + \frac{1}{2} \right) + \ln\left( \sqrt{2\pi} \right) \\
& \quad + \ln\left( c_0 + \frac{c_1}{z + 1} + \frac{c_2}{z + 2} + \ldots + \frac{c_N}{z + N} \right)
\end{align*}$$

*Steps*:

1.  `cof` is an array of coefficients($c_{1}, c_{2}, ... , c_{n}$) used for the series expansion.

::: {#imp-gammaln-issue .callout-important}
#### Issue:

The coefficients chosen in the Lanczos approximation are not used anymore in the newest implementations of the C code. The newest literature suggests different coefficients (Available in the third edition of Numerical Recipes in C).
:::

2.  Compute the adjusted input: Adjust the input value `xx` to `x` by subtracting one to align with the formulation in the Lanczos approximation, which is $\Gamma(z + 1)$.

3.  Check for non-positive input: If the adjusted input is non-positive, return `NA_real_` as the computation cannot proceed with non-positive values ($z > 0$).

4.  Create `fpf`: this is equivalent to $\gamma + \frac{1}{2}$, with $\gamma = 5$.

5.  Compute the Lanczos sum: Initialize the variable `ser` with 1.0 ($c_{0}$), then add the terms of the Lanczos sum using a for loop. Each term is a coefficient from the `cof` vector divided by the incremented input `x`.

6.  Calculate the final log-gamma value: Combine the logarithm of `stp` ($\sqrt{2\pi}$) with the sum `ser` and the main part of the Lanczos approximation involving `tmp`.

7.  Return the result: The function returns the computed value, which is the natural logarithm of the gamma function for the input `xx`.

### rtSafe

*Description*: `rtSafe`is an implementation of a 'safe' version of `rtNewt`. It adds checks to ensure that the method is robust and converges correctly.

```{r}
rtSafe <- function(x1, x2, xacc, mean, povline, A, B, C) {
  
  # Evaluate the function at the initial guesses x1 and x2:
  
  funcCall1 <- funcD(x1, mean, povline, A, B, C)
  fl <- funcCall1[[1]]  # Function value at x1 (low)

  funcCall2 <- funcD(x2, mean, povline, A, B, C)
  fh <- funcCall2[[1]]  # Function value at x2 (high)
  df <- funcCall2[[2]]  # Derivative value at x2

  # Check if the function values at x1 and x2 bracket a root (have opposite signs)
  if (fl * fh >= 0) {
    # If they do not bracket a root, fall back to the Newton-Raphson method (which has larger brackets)
    res <- rtNewt(mean = mean, povline = povline, A = A, B = B, C = C)
    return(res)
  }

  # Assign xl and xh to bracket the root with xl having a function value less than 0
  if (fl < 0) {
    xl <- x1
    xh <- x2
  } else {
    xl <- x2
    xh <- x1
  }

  # Initialize the safe guess for the root as the midpoint between x1 and x2
  rtsafe <- 0.5 * (x1 + x2)
  # Set the full step and fractional step sizes to the interval length
  dxold <- abs(x2 - x1)
  dx <- dxold

  # Enter the main iteration loop
  for (i in seq_len(99)) {
    # Update the function and derivative at the current guess rtsafe
    funcCall3 <- funcD(rtsafe, mean, povline, A, B, C)
    f <- funcCall3[[1]]
    df <- funcCall3[[2]]

    # Check for convergence by comparing the change in the root with the tolerance
    if (abs(2 * f) > abs(dxold * df)) {
      # If not converging, bisect the interval
      dxold <- dx
      dx <- 0.5 * (xh - xl)
      rtsafe <- xl + dx
      if (xl == rtsafe) return(rtsafe)  # Check for underflow
    } else {
      # If converging, use Newton-Raphson step
      dxold <- dx
      dx <- f / df
      temp <- rtsafe
      rtsafe <- temp - dx
      if (temp == rtsafe) return(rtsafe)  # Check for underflow
    }

    # If the size of the Newton-Raphson step is less than the tolerance, return the root
    if (abs(dx) < xacc) return(rtsafe)

    # Update the function value at the new guess
    funcCall4 <- funcD(rtsafe, mean, povline, A, B, C)
    f <- funcCall4[[1]]

    # Update the interval bounds to continue the bisection process
    if (f < 0) {
      xl <- rtsafe
    } else {
      xh <- rtsafe
    }
  }
  # If the loop finishes without returning, it indicates failure to converge
  return(NA_real_)
}

```

*Steps*:

1.  Input parameters:

-   `x1` and `x2`: initial guesses for the root, defining a bracket within which the root lies.
-   `xacc`: desired accuracy for the root.
-   `mean`, `povline`, and the parameters of the Lorenz curve.

2.  Initial Function Evaluation:
    -   Calculate the function values at `x1` and `x2`.
    -   If the function values at these points do not bracket a root (i.e., the function values are not of opposite signs), the function will revert to use `rtNewt` directly, using larger default bounds (0 and 1).
3.  Bracketing the Root:
    -   Assign `xl` and `xh` to be the lower and upper bounds of the bracketing interval, respectively.
    -   Ensure that the lower bound `xl` has a function value less than zero.
    -   *Note*: According to the Intermediate Value Theorem, if a continuous function changes sign over an interval, then there must be at least one root (zero crossing) within that interval. Therefore, bracketing is a way of confirming that within the interval `[xl, xh]` there is at least one root.
4.  Iterative Process:
    -   Initialize the midpoint guess for the root, `rtsafe`, as halfway between `x1` and `x2`.
    -   Set up variables `dxold` and `dx` to control the iteration step size.
5.  Main Iteration Loop:
    -   For each iteration, calculate the function value and its derivative at `rtsafe`.
    -   Use a combination of bisection and Newton-Raphson updates to adjust `rtsafe` and home in on the root.
    -   If the function value at the new guess `rtsafe` is negative, update `xl`; otherwise, update `xh`.
    -   Check for convergence by comparing the change in `rtsafe` with the tolerance `xacc`.
    -   Repeat this process for a maximum of 99 iterations or until the root is found within the desired accuracy.
6.  Convergence Check:
    -   If the algorithm converges to a root within the desired accuracy, the function returns the value of `rtsafe`.
    -   If the function fails to converge within the set number of iterations, it returns `NA_real_` to indicate an unsuccessful search.

### funcD

*Description*: `funcD` returns two equations needed to calculate $H$ when using $$L'(H) + \frac{z}{\mu} - 1 = 0$$.

```{r}
funcD <- function(x, mean, povline, A, B, C) {
  x1 <- 1 - x
  v1 <- (x^B) * (x1^C)
  f <- (A * v1 * ((B / x) - (C / x1))) + (povline / mean) - 1
  df <- A * v1 * (((B / x) - (C / x1))^2 - (B / x^2) - (C / x1^2))
  return(list(
    f = f,
    df = df
  ))
}


```

*Relevant Equations*:

-   `f` is equivalent to $L'(H) + \frac{z}{\mu} - 1 = 0$
-   `df` is the derivative of `f`.

See `rtNewt` for more details.

### rtNewt

*Description*: `rtNewt` uses the Newton-Raphson iteration to find $H$ (the headcount) using: $$L'(H) + \frac{z}{\mu} - 1 = 0$$.

```{r}
rtNewt <- function(mean, povline, A, B, C) {
  # Initial bounds of the search interval
  x1 <- 0L
  x2 <- 1L
  # Accuracy tolerance for the solution
  xacc <- 1e-4
  # Initial guess for the root, halfway between x1 and x2
  rtnewt <- 0.5 * (x1 + x2)

  # Perform up to 19 iterations to find the root
  for (i in seq_len(19)) {
    # Current guess for the proportion of the population
    x <- rtnewt
    # Part of the Lorenz curve derivative: x^B * (1 - x)^C
    v1 <- (x^B) * ((1 - x)^C)
    # The function whose root we're finding; it's the difference between the
    # Lorenz curve's derivative at x and the normalized poverty line
    f <- A * v1 * ((B / x) - (C / (1 - x))) + (povline / mean) - 1
    # Derivative of f with respect to x
    df <- A * v1 * (((B / x) - (C / (1 - x)))^2 - (B / x^2) - (C / (1 - x)^2))
    # Newton-Raphson step size
    dx <- f / df
    # Update the current guess for the root
    rtnewt <- rtnewt - dx

    # Check if the new guess is outside the initial interval
    if ((x1 - rtnewt) * (rtnewt - x2) < 0) {
      # Reset rtnewt to the midpoint of the interval closer to the bounds
      rtnewt <- if (rtnewt < x1) { 0.5 * (x2 - x) } else { 0.5 * (x - x1) }
    } else {
      # If the change in the root estimate is smaller than the tolerance, return it
      if (abs(dx) < xacc) {
        return(rtnewt)
      }
    }
  }
  # If no convergence, return NA to indicate failure
  return(NA_real_)
}
```

*Relevant equations*:

The first derivative of the Lorenz Beta curve evaluated at $H$ is equal to:

```{=tex}
\begin{aligned}
L'(H) = 1 - \frac{z}{\mu} \\
\theta H^\gamma (1-H)^\delta [\frac{\gamma}{H} - \frac{\delta}{1-H}] = 1 - \frac{z}{\mu}

\end{aligned}
```
The Newton-Raphson iteration is used to find the root of an equation. In our case, this means finding when $L'(H) = 0$ or:

$$\theta H^\gamma (1-H)^\delta [\frac{\gamma}{H} - \frac{\delta}{1-H}] + \frac{z}{\mu} - 1 = 0$$

*Steps:*

-   Initialize the algorithm:

    -   `x1` and `x2` are the initial boundaries of the search interval
    -   `xacc` is the accuracy tolerance for the solution
    -   `rtnewt` is the initial guess for the root, taken as the midpoint of the interval `[x1, x2]`

-   Begin the Iterative process:

    -   Each iteration consists of:
        -   `v1` calculates a part of the function we need to solve for $f(x) = L'(H)$, $x^B (1-x)^C = H^\gamma (1-H)^\delta$.
        -   `f` sets up the function $f(x) = L'(H)$ as defined above.
        -   `df` sets up the derivative $f'(x) = L''(H)$.

-   Update the Guess for the Root

    -   `dx` calculates the Newton-Raphson step size (`dx = f/df` = $\frac{f(x_{old})}{f'(x_{old})}$)
    -   `rtnewt` updates the current guess by subtracting `dx` from it. This step is based on the Newton-Raphson update rule:

    $$
    x_{new} = x_{old} - \frac{f(x_{old})}{f'(x_{old})}
    $$

-   Check for interval boundaries and convergence: after each iteration, the algorithm checks whether the new guess `rtnewt` is within the initial interval `[x1, x2]`. If it is not, the interval is adjusted. If the guess is within the desired accuracy, the function returns the root `rtnewt`.

...

## References

1.  Villasenor, J., B. C. Arnold. 1989. "[Elliptical Lorenz curves](https://doi.org/10.1016/0304-4076(89)90089-4)". *Journal of Econometrics 40* (2): 327-338.

2.  Krause, M. 2013. "[Corrigendum to Elliptical Lorenz curves](https://doi.org/10.1016/j.jeconom.2013.01.001)". *Journal of Econometrics 174* (1): 44.

3.  Datt, G. 1998. "[Computational Tools For Poverty Measurement And Analysis](https://www.ifpri.org/publication/computational-tools-poverty-measurement-and-analysis)". FCND Discussion Paper 50. World Bank, Washington, DC.

4.  Rohde, N. (2008). "[Lorenz Curves and Generalised Entropy Inequality Measures](https://doi.org/10.1007/978-0-387-72796-7_15)". In: Chotikapanich, D. (eds) *Modeling Income Distributions and Lorenz Curves. Economic Studies in Equality, Social Exclusion and Well-Being*, vol 5. Springer, New York, NY.

5.  Kakwani, N. 1980. "[On a Class of Poverty Measures](https://EconPapers.repec.org/RePEc:ecm:emetrp:v:48:y:1980:i:2:p:437-46)". *Econometrica 48* (2): 437-46.

6.  Press, W. H. et al. 1992. "[Gamma Function, Beta Function, Factorials, Binomial Coefficients](https://s3.amazonaws.com/nrbook.com/book_C210.html), Section 6.1, page 213 in *Numerical recipes in C: the art of scientific computing* (2nd edition, Cambridge Univeristy Press.

## Appendix

Find function using the following list:

-   @sec-pip_stats: `gd_compute_pip_stats`
-   @sec-select_lorenz: `gd_select_lorenz`
-   `gd_compute_pip_stats_lq`
-   `gd_compute_pip_stats_lb`
-   `create_functional_form_lq`
-   `derive_lq`
-   `gd_estimate_lq`
-   `check_curve_validity_lq`
-   `gd_compute_dist_stats_lq`