Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for models with unit specific time trends? #24

Open
TomBearpark opened this issue Mar 1, 2024 · 10 comments
Open

Support for models with unit specific time trends? #24

TomBearpark opened this issue Mar 1, 2024 · 10 comments

Comments

@TomBearpark
Copy link

Hi, thanks for a great package!

I am trying to calculate CR3 standard errors for a model that includes unit specific time trends. Do you have any advice on how / if this can be implemented in summclust?

Here's a minimal example, showing the error that I get when I try to use the summclust() function on a model that includes unit specific time trends.

library(summclust)
library(fixest)
df <- fixest::trade

fit  <- feols(Euros ~ dist_km | Year + Origin + Origin[Year], data = df)
summ <- summclust(fit, params = ~dist_km, cluster = ~Origin)

The error is pasted below:

Error in `[[.default`(Origin, Year) : 
  attempt to select more than one element in vectorIndex

Thanks so much for your time and your help

@s3alfisc
Copy link
Owner

s3alfisc commented Mar 1, 2024

Hi @TomBearpark, I think I'd have to change the data preprocessing to allow for that. I'll take a look at this tomorrow!

@s3alfisc
Copy link
Owner

s3alfisc commented Mar 1, 2024

I was about to suggest that you could try sandwich::vcovBS, but it looks like the same error arises:

library(sandwich)
vcovBS(fit, cluster = ~Origin, type = "jackknife", center = "estimate")

Unfortunately you also cannot use fwildclusterboot::boottest(), as is also complains about an error with varying slopes syntax:

library(fwildclusterboot)
boottest(fit, param = ~dist_km, clustid = ~Origin, B = 9999, bootstrap_type = "31")

So I'll take a closer look at this tomorrow!

Best, Alex

@TomBearpark
Copy link
Author

Thanks so much for your quick response Alex, and for looking into this!

@TomBearpark
Copy link
Author

TomBearpark commented Mar 12, 2024

Hi @s3alfisc, I just wanted to follow up on this thread. Is there anything I could do to help?

As far as I am aware, none of the existing packages provide support for estimating CV3 SEs with unit specific time trends.

One practical solution I considered was to partial out the time trends prior to using your package. E.g., like this:

library(summclust)
library(fixest)
df <- fixest::trade

# Target model
fit  <- feols(Euros ~ dist_km | Year + Origin + Origin[Year], data = df)
# summ <- summclust(fit, params = ~dist_km, cluster = ~Origin)

# Residualise out the time trends 
df$Euros_dt   <- residuals(feols(Euros ~ 1 |  Origin[Year], data = df))
df$dist_km_dt <- residuals(feols(dist_km ~ 1 |  Origin[Year], data = df))

fit2 <- feols(Euros_dt ~ dist_km_dt | Year + Origin, data = df)

# Check we get the same coefficients
etable(fit, fit2)

## Estimate the CR3 SEs on the residualised model
summ <- summclust(fit2, params = ~dist_km_dt, cluster = ~Origin)

# Compare the CR3 SE to the other estimators... 
se(summ$vcov)['dist_km_dt']
se(fit)['dist_km']
se(fit2)['dist_km_dt']

But I wasn't sure if there is some degrees of freedom correction, or other issue I am unaware of, that means this doesn't give the correct SEs. I'd be very grateful for any advice on this you may have!

Thanks again,
Tom

@s3alfisc
Copy link
Owner

s3alfisc commented Mar 12, 2024

Hi @TomBearpark - I took another look and this and realized that it will not be possible for me to solve this. The reason is that the summclust algo expands the fixed effects into dummy variables via R's model.matrix and I have not implemented logic for expanding varying slopes.

There are two things that you can do, demeaning prior to running summclust will not give you correct standard errors (at least I think so) - I can explain this tomorrow but I don't have the time tonight.

Option 1: You implement the crv3 inference by hand. This can be done by simply looping over data sets for individual regressions. I implement such logic in pyfixest here.

                        beta_jack = np.zeros((len(clustid), _k))
                        for ixg, g in enumerate(clustid):
                            # direct leave one cluster out implementation
                            data = _data[~np.equal(g, cluster_col)]
                            fit = feols(fml=self._fml, data=data, vcov="iid")
                            beta_jack[ixg, :] = fit.coef().to_numpy()

                    # optional: beta_bar in MNW (2022)
                    # center = "estimate"
                    # if center == 'estimate':
                    #    beta_center = beta_hat
                    # else:
                    #    beta_center = np.mean(beta_jack, axis = 0)
                    beta_center = _beta_hat

                    vcov = np.zeros((_k, _k))
                    for ixg, g in enumerate(clustid):
                        beta_centered = beta_jack[ixg, :] - beta_center
                        vcov += np.outer(beta_centered, beta_centered)

                    self._vcov += self._ssc[x] * vcov

In a nutshell, for each cluster level g, you take the subset of data of cluster g. Then you fit the regression model on the subset.
You repeat for all G clusters. You substract the initial estimate of the "full" model (i.e. beta_g - beta). You compute the crossproduct and add this for each of the G clusters. Finally, you apply a small sample correction by dividing by G. I can provide you with R code tomorrow but don't jae

The other option is that you simply don't use fixest's [var1]var2 syntax for nested fixed effects and use base R formulas instead. These should work with summclust, e.g.

library(broom)
fit  = feols(log(Euros) ~ dist_km |Destination + Year + Destination[Year], df)
broom::tidy(fit, digits = 8)

fit = feols(log(Euros) ~ dist_km + as.factor(Destination):as.factor(Year) | Destination + Year, df)
broom::tidy(fit, digits = 8)[1:2, ]

should give identical results.

@s3alfisc
Copy link
Owner

s3alfisc commented Mar 12, 2024

Just implemented an R function quickly that should not throw any errors:

vcov3 = function(object, cluster){
  
  data = fixest:::fetch_data(object)
  cluster = data[,cluster]
  unique_cluster = as.vector(unique(cluster))
  G = length(unique(cluster))
  fml_all = object$fml_all
  
  obj_call = object$call
  obj_call$data = quote(data[cluster == g,])
  
  beta_centered = lapply(c(unique_cluster), function(g) coef(eval(obj_call)) - coef(object)) 
  vcov = lapply(seq_along(unique_cluster), function(g) tcrossprod(as.matrix(beta_centered[[g]])))
  vcov = Reduce("+", vcov) / G
  vcov
  
}

#summary(fit)

vcov3(fit, "Year") # no error
sandwich::vcovBS(fit, cluster = ~Year, type = "jackknife", center = "estimate") fails

You can use sandwich::vcovBS to test vcov3. No guarantee that it does work, though I think it should get close.

@TomBearpark
Copy link
Author

Thanks so much for your response! Your code is extremely helpful! With a few tweaks to your function, i think the version below seems to work well! The tweaks are:

  • Making it work for tibbles as well as dataframes
  • Doing leave one cluster out, rather than keep on cluster for the jackknife
  • Degrees of freedom correction should be G/(G-1)
vcov3 = function(object, cluster){
  
  data = fixest:::fetch_data(object)

  # to be able to process tibbles
  data = as.data.frame(data) 
  cluster = data[,cluster]
  unique_cluster = as.vector(unique(cluster))
  G = length(unique(cluster))
  fml_all = object$fml_all  
  obj_call = object$call

  # need to remove a cluster not keep only one
  obj_call$data = quote(data[cluster != g,]) 
  
  beta_centered = lapply(c(unique_cluster), function(g) coef(eval(obj_call)) - coef(object)) 
  vcov = lapply(seq_along(unique_cluster), function(g) tcrossprod(as.matrix(beta_centered[[g]])))

  # correction factor has changed
  vcov = Reduce("+", vcov) * (G-1)/G 
  vcov
}

Thanks again!
Tom

@s3alfisc
Copy link
Owner

Perfect! Glad that it works =) sorry about the few mistakes, it was approaching 23:00 here in Germany yesterday night and my brain wasn't at full capacity any more 😅

I think we should try to make this available more broadly. Maybe you could open an issue in the fixest repo and post the code there? Laurent has mentioned that he will have some guidelines on how to incorporate vcov types in a near release, and then you (in case you want to) or I could pick it up and make the function available from fixest?

@TomBearpark
Copy link
Author

Thanks @s3alfisc ! I've posted the code into a fixest issue here. lrberge/fixest#481

@s3alfisc
Copy link
Owner

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants