-
Notifications
You must be signed in to change notification settings - Fork 727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create standard dgp for metric aggregation #705
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a good start but there are a few things that should be cleaned up.
econml/data/dgps.py
Outdated
@@ -93,3 +94,182 @@ def _process_ihdp_sim_data(): | |||
# Append a column of ones as intercept | |||
X = np.insert(X, 0, np.ones(X.shape[0]), axis=1) | |||
return T, X | |||
|
|||
|
|||
class StandardDGP(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(very minor)
class StandardDGP(): | |
class StandardDGP: |
econml/data/dgps.py
Outdated
@@ -93,3 +94,182 @@ def _process_ihdp_sim_data(): | |||
# Append a column of ones as intercept | |||
X = np.insert(X, 0, np.ones(X.shape[0]), axis=1) | |||
return T, X | |||
|
|||
|
|||
class StandardDGP(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add docstring, especially since some arguments can be a function or a dict.
econml/data/dgps.py
Outdated
else: # else must be dict | ||
if nuisance_Y is None: | ||
nuisance_Y = {'support': self.d_x, 'degree': 1} | ||
nuisance_Y['k'] = self.d_x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider asserting the expected type here, something like:
nuisance_Y['k'] = self.d_x | |
assert(isinstance(nuisance_Y, dict), f"nuisance_Y must be a callable or dict, but got {type(nuisance_Y)}") | |
nuisance_Y['k'] = self.d_x |
(and likewise for the other similar arguments)
|
||
def gen_Y(self): | ||
self.y_noise = np.random.normal(size=(self.n, self.d_y), scale=self.y_eps) | ||
self.Y = self.theta(self.X) * self.y_of_t(self.T) + self.nuisance_Y(self.X) + self.y_noise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like y_of_t is really a treatment featurizer, is that right? If so, change the name accordingly.
econml/data/dgps.py
Outdated
else: # else must be dict | ||
if nuisance_Y is None: | ||
nuisance_Y = {'support': self.d_x, 'degree': 1} | ||
nuisance_Y['k'] = self.d_x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting nuisance_Y['k']
is not strictly necessary since the default behavior of gen_nuisance
will do this for you, but perhaps it's worth being explicit.
More importantly, making this assignment will mutate the dictionary that was passed in, which is possibly undesirable; probably better to do something like:
nuisance_Y['k'] = self.d_x | |
nuisance_Y = { **nuisance_Y, 'k':self.d_x } |
which will create a new dictionary instead of altering the old one.
Furthermore, what if the dictionary already had a 'k'
entry - do you really want to ignore and override it?
econml/data/dgps.py
Outdated
self.t_eps = t_eps | ||
|
||
def gen_Y(self): | ||
self.y_noise = np.random.normal(size=(self.n, self.d_y), scale=self.y_eps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider allowing the user to specify a seed so that results are reproducible.
econml/data/dgps.py
Outdated
@@ -93,3 +94,182 @@ def _process_ihdp_sim_data(): | |||
# Append a column of ones as intercept | |||
X = np.insert(X, 0, np.ones(X.shape[0]), axis=1) | |||
return T, X | |||
|
|||
|
|||
class StandardDGP(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably support generating and using W as well.
econml/data/dgps.py
Outdated
@@ -93,3 +94,182 @@ def _process_ihdp_sim_data(): | |||
# Append a column of ones as intercept | |||
X = np.insert(X, 0, np.ones(X.shape[0]), axis=1) | |||
return T, X | |||
|
|||
|
|||
class StandardDGP(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably support using a user-specified featurizer on X.
econml/data/dgps.py
Outdated
@@ -93,3 +94,182 @@ def _process_ihdp_sim_data(): | |||
# Append a column of ones as intercept | |||
X = np.insert(X, 0, np.ones(X.shape[0]), axis=1) | |||
return T, X | |||
|
|||
|
|||
class StandardDGP(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add tests to at least verify that the API can actually be used and that you get arrays of the right dimensions out, etc., even if there aren't good checks for the actual data (but perhaps there are useful checks there, too)
econml/data/dgps.py
Outdated
@@ -93,3 +94,182 @@ def _process_ihdp_sim_data(): | |||
# Append a column of ones as intercept | |||
X = np.insert(X, 0, np.ones(X.shape[0]), axis=1) | |||
return T, X | |||
|
|||
|
|||
class StandardDGP(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect users to use this class to test things? If so, perhaps also add a DGP section to the documentation (and at least add it to the module docs). But if this is mainly for internal use, add a comment to the docstring indicating that instead.
5961cbf
to
47ad1ca
Compare
No description provided.