diff --git a/montecarlo/README.md b/montecarlo/README.md
new file mode 100644
index 000000000..9d7dce89c
--- /dev/null
+++ b/montecarlo/README.md
@@ -0,0 +1,100 @@
+<!-- Copyright (c) Microsoft Corporation. All rights reserved. -->
+<!-- Licensed under the MIT License. -->
+
+# Monte Carlo Library
+The folder mcpy contains code related to running monte carlo experiments from config files and saving and running the results.
+Look at dml_te_confi.py and dml_te_functions.py as an example.
+
+We can run a monte carlo simulation with the following from the command line:
+```
+python3 run_mc_from_config.py --config dml_te_config
+```
+
+The config dictionary allows one to run monte carlo experiments for some configuration of the parameters of the dgp, estimation methods, metrics, and plot functions. The MonteCarlo class will run experiments for each dgp, for each method, and on those results of each of those, calculate metrics and plots. DGPs, estimation methods, metrics, and plot functions can all be user defined. Thus this is a general framework to make monte carlo simulations easy to automate.  
+
+The config looks like the following:
+```python
+CONFIG = {
+  # whether to do a simulation for a fixed set of parameters vs. 'sweep_parameter',
+  # where it will run multiple simulations for all permutations of the parameters
+  "type" : 'single_parameter,
+  # references to dgp functions
+  "dgps" : {
+    'dgp1': dml_te_functions.instance_params
+  },
+  # dgp options for each dgp
+  "dgp_opts" : {
+    'dgp1': {
+        'n_samples': 2000,
+        'n_features': 1,
+        'n_controls': 30,
+        'support_size': 5
+      },
+  },
+  # references to methods
+  "methods" : {
+
+  },
+  # method options, for each method
+  "method_opts" : {
+    "method1": {
+      "LinearDMLCate": dml_te_functions.linear_dml_fit,
+    },
+    "method2": {
+
+    }
+  },
+  # references to metric functions
+  "metrics" : {
+    'rmse': metrics.rmse,
+    'conf_length': metrics.conf_length,
+    'coverage': metrics.coverage,
+    'std': metrics.std,
+    'coverage_band': metrics.coverage_band
+  },
+  # references to plot functions
+  "plots" : {
+    'plot1': plotting.plot_metrics,
+    'plot2': plotting.plot_visualization,
+    'plot3': plotting.plot_violin
+  },
+  # different metrics are plotted differently
+  # single summary metrics are single value per dgp and method
+  "single_summary_metrics" : ['coverage_band'], # list of the metrics that are single summary metrics
+  # plot to run for when the type is sweep_parameter.
+  "sweep_plots" :{
+  },
+  # monte carlo simulation options
+  "mc_opts" : {
+    'n_experiments': 5, # number of monte carlo experiments
+    "seed": 123
+  },
+  # comparison method reference # not supported currently
+  "proposed_method" : {
+  },
+  # directory to save things to
+  "target_dir" : "",
+  "reload_results" : False # to rerun or load previous results
+}
+```
+
+# Some Next Steps
+
+- Use cloud computing service to run all simulations with a lot of cores, in parallel, on the cloud. This will yield better simulation results and allow us to run simulations over many more parameters and better evaluate the empirical performance of the estimators.
+- A lot of HTE are calculated for some treatment T0 to T1. When evaluating the performance, the user should perhaps specify those specific vectors on their own in the dgp amd method functions that they write, or perhaps they should also be swept over in the MonteCarloSweep. A question is for how to sweep over different values for them, e.g, in what increments, especially as they get to be high dimensional.
+- Construct more dgps, and look at more real data sets to perform simulations on
+- Integrate this with the rest of the code suite such that whenever significant updates to the implementations of the estimators are made, a host of MC simulations are run and the results are compared to previous ones before the implementation changes. This would allow us to understand the change in empirical performance of the estimators after the change in implementation.
+- More metrics and plotting functions, try to generalize them and not have them specific to specific DGP/method/ implementations.  
+- Overall challenge of how to keep the framework general and but also be able to define nitty-gritty DGPs/methods/metrics/plots. Return types that are more vague like lists or tuples or dictionaries may allow for lots of different DGP/method implementations that use the framework at the same time, where each of those is specified in the references from the config file. Best way to realize what small adjustments to make is to implement a bunch of DGPs/methods/metrics/plots and see what return types need to be made more general or more specific/tailored. For example, when X_test is multidimensional there's an issue for how to plot the metrics. Do you plot them with respect to the first column or X_test? or? This means we may have to make the plot functions specific to that DGP/method set up. The framework allows for anyone to write their own dgps/methods/metrics/plot functions, but achieving more generality would be preferable. Maybe for sets/classes of certain DGPs that are used in testing a lot together. Anything to avoid lots of specific code for each dgp/etc., otherwise the test codebase will get big.
+
+# Example Plots
+<p align="center">
+  <img src="example_plots/plot1_rmse.png" height="100" title="rmse">
+  <img src="example_plots/plot1_coverage.png" height="100" title="coverage">
+  <img src="example_plots/plot1_conf_length.png" height="100" title="conf_length">
+  <img src="example_plots/plot1_std.png" height="100" title="std">
+  <img src="example_plots/plot2.png" height="100" title="Performance comparison of estimators">
+  <img src="example_plots/plot3.png" height="100" title="Coverage band">
+  <br/>
+  <i>Example plots for DGP in dml_te_config.py</i>
+</p>
diff --git a/montecarlo/dml_te_config.py b/montecarlo/dml_te_config.py
new file mode 100644
index 000000000..ca18b541c
--- /dev/null
+++ b/montecarlo/dml_te_config.py
@@ -0,0 +1,92 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License.
+
+import os
+import numpy as np
+import dml_te_functions
+from mcpy import metrics
+from mcpy import plotting
+from mcpy import utils
+from sklearn.linear_model import Lasso, LassoCV, LogisticRegression, LogisticRegressionCV,LinearRegression,MultiTaskElasticNet,MultiTaskElasticNetCV
+from sklearn.ensemble import RandomForestRegressor,RandomForestClassifier
+from sklearn.preprocessing import PolynomialFeatures
+
+CONFIG = {
+        "type": 'single_parameter',
+        "dgps": {
+            "dgp1": dml_te_functions.gen_data
+        },
+        "dgp_instance_fns": {
+            'dgp1': dml_te_functions.instance_params
+        },
+        "dgp_opts": {
+            'dgp1': {
+                'n_samples': 2000,
+                'n_features': 1,
+                'n_controls': 30,
+                'support_size': 5
+            },
+        },
+        "methods": {
+            "LinearDMLCate": dml_te_functions.linear_dml_fit,
+            "SparseLinearDMLCate": dml_te_functions.sparse_linear_dml_poly_fit,
+            # "DMLCate": dml_te_functions.dml_poly_fit,
+            # "ForestDMLCate": dml_te_functions.forest_dml_fit
+        },
+        "method_opts": {
+                'LinearDMLCate': {
+                    'model_y': RandomForestRegressor(),
+                    'model_t': RandomForestRegressor(),
+                    'inference': 'statsmodels'
+                },
+                'SparseLinearDMLCate': {
+                    'model_y': RandomForestRegressor(),
+                    'model_t': RandomForestRegressor(),
+                    'featurizer': PolynomialFeatures(degree=3),
+                    'inference': 'debiasedlasso'
+                },
+            'DMLCate': {
+                    'model_y': RandomForestRegressor(),
+                    'model_t': RandomForestRegressor(),
+                    'model_final': Lasso(alpha=0.1, fit_intercept=False),
+                    'featurizer': PolynomialFeatures(degree=10),
+                    'inference': 'bootstrap'
+                },
+                'ForestDMLCate': {
+                    'model_y': RandomForestRegressor(),
+                    'model_t': RandomForestRegressor(),
+                    'discrete_treatment': False,
+                    'n_estimators': 1000,
+                    'subsample_fr': 0.8,
+                    'min_samples_leaf': 10,
+                    'min_impurity_decrease': 0.001,
+                    'verbose': 0,
+                    'min_weight_fraction_leaf': 0.01,
+                    'inference': 'bootstrap'
+                }
+        },
+        "metrics": {
+            'rmse': metrics.rmse,
+            'conf_length': metrics.conf_length,
+            'coverage': metrics.coverage,
+            'std': metrics.std,
+            'coverage_band': metrics.coverage_band
+        },
+        "plots": {
+            'plot1': plotting.plot_metrics,
+            'plot2': plotting.plot_visualization,
+            'plot3': plotting.plot_violin
+        },
+        # different metrics are plotted differnetly
+        # single summary metrics are a single value per dgp and method
+        "single_summary_metrics": ['coverage_band'],
+        "sweep_plots": {
+        },
+        "mc_opts": {
+            'n_experiments': 5, # number of monte carlo experiments
+            "seed": 123
+        },
+        "proposed_method": "CrossOrtho",
+        "target_dir": "dml_te_test",
+        "reload_results": False
+    }
diff --git a/montecarlo/dml_te_functions.py b/montecarlo/dml_te_functions.py
new file mode 100644
index 000000000..1f3aa3da6
--- /dev/null
+++ b/montecarlo/dml_te_functions.py
@@ -0,0 +1,264 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License.
+
+from econml.dml import DMLCateEstimator, LinearDMLCateEstimator, SparseLinearDMLCateEstimator, ForestDMLCateEstimator
+import numpy as np
+from itertools import product
+from sklearn.linear_model import Lasso, LassoCV, LogisticRegression, LogisticRegressionCV,LinearRegression,MultiTaskElasticNet,MultiTaskElasticNetCV
+from sklearn.ensemble import RandomForestRegressor,RandomForestClassifier
+from sklearn.preprocessing import PolynomialFeatures
+import matplotlib.pyplot as plt
+import matplotlib
+from sklearn.model_selection import train_test_split
+
+def instance_params(opts, seed):
+    """
+    Generates the instance parameters for the following DGP from double machine
+    learning juypter notebook at
+    https://github.com/microsoft/EconML/blob/master/notebooks/Double%20Machine%20Learning%20Examples.ipynb:
+
+    Parameters
+    ----------
+    opts : dictionary
+        dgp-speific options from config file
+    seed : int
+        random seed for random data generation
+
+    Returns
+    -------
+    instance_params : dictionary of instance parameters for DGP
+    """
+    n_controls = opts['n_controls']
+    n_features = opts['n_features']
+    n_samples = opts['n_samples']
+    support_size = opts['support_size']
+
+    instance_params = {}
+    instance_params['support_Y'] = np.random.choice(np.arange(n_controls), size=support_size, replace=False)
+    instance_params['coefs_Y'] = np.random.uniform(0, 1, size=support_size)
+    instance_params['support_T'] = instance_params['support_Y']
+    instance_params['coefs_T'] = np.random.uniform(0, 1, size=support_size)
+    return instance_params
+
+def gen_data(opts, instance_params, seed):
+    """
+    Implements the following DGP from
+    double machine learning jupyter notebook at
+    https://github.com/microsoft/EconML/blob/master/notebooks/Double%20Machine%20Learning%20Examples.ipynb:
+
+    Y = T * Theta(X) + <W, Beta> + epsilon, epsilon ~ N(0, 1)
+    T = <W, gamma> + eta, eta ~ N(0, 1)
+    W ~ N(O, I_n_controls)
+    X ~ N(0, 1)^n_controls
+
+    Parameters
+    ----------
+    opts : dictionary
+        dgp-specific options from config file
+    instance_params : dictionary
+        instance parameters beta, gamma, etc. for DGP. These are
+        fixed for each time the DGP is run, so that each experiment uses the same instance
+        parameters but different Y, X, T, W.
+    seed : int
+        random seed for random data generation
+
+    Returns
+    -------
+    (X_test, Y_rain, T_train, X_train, W_train), expected_te : tuple consisting of
+    training and test data, and true_parameters.
+    X_test : test data to evaluate metohd on, n x n_features
+    Y_train : outcome data
+    T_train : treatment data
+    W_train : controls data
+    expected_te : true value of treatment effect
+    """
+    n_controls = opts['n_controls']
+    n_features = opts['n_features']
+    n_samples = opts['n_samples']
+    support_Y = instance_params['support_Y']
+    coefs_Y = instance_params['coefs_Y']
+    support_T = instance_params['support_T']
+    coefs_T = instance_params['coefs_T']
+    epsilon_sample = lambda n: np.random.uniform(-1, 1, size=n)
+    eta_sample = lambda n: np.random.uniform(-1, 1, size=n)
+
+    # Generate controls, covariates, treatments, and outcomes
+    W = np.random.normal(0, 1, size=(n_samples, n_controls))
+    X = np.random.uniform(0, 1, size=(n_samples, n_features))
+
+    # Treatment effect function
+    def exp_te(x):
+        return np.exp(2*x[0])
+
+    # Heterogenous treatment effects
+    TE = np.array([exp_te(x) for x in X])
+    T = np.dot(W[:, support_T], coefs_T) + eta_sample(n_samples)
+    Y = TE * T + np.dot(W[:, support_T], coefs_Y) + epsilon_sample(n_samples)
+    Y_train, Y_val, T_train, T_val, X_train, X_val, W_train, W_val = train_test_split(Y, T, X, W, test_size=.2)
+    # why use train_test_split at all here?
+
+    # Generate test data
+    X_test = np.array(list(product(np.arange(0, 1, 0.01), repeat=n_features)))
+    expected_te = np.array([exp_te(x_i) for x_i in X_test])
+
+    # data, true_param
+    return (X_test, Y_train, T_train, X_train, W_train), expected_te
+
+def linear_dml_fit(data, opts, seed):
+    """
+    Trains the LinearDMLCateEstimator to predict the heterogenous treatment effects
+
+    Parameters
+    ----------
+    data : tuple
+        relevant data that comes from the data generating process
+    opts : dictionary
+        method-specific options
+    seed : int
+        random seed for random data generation
+
+    Returns
+    -------
+    (X_test, const_marginal_effect), (lb, ub) : tuple of HTE and confidence interval
+    X_test : Test data for use in plotting downstream
+    const_marginal_effect : desired treatment effect
+    (lb, ub) : tuple of confidence intervals for each data point in X_test
+    """
+    X_test, Y, T, X, W = data
+    model_y = opts['model_y']
+    model_t = opts['model_t']
+    inference = opts['inference']
+
+    est = LinearDMLCateEstimator(model_y=model_y, model_t=model_t, random_state=seed)
+    est.fit(Y, T, X, W, inference=inference)
+    const_marginal_effect = est.const_marginal_effect(X_test) # same as  est.effect(X)
+    lb, ub = est.const_marginal_effect_interval(X_test, alpha=0.05)
+
+    return (X_test, const_marginal_effect), (lb, ub)
+
+def sparse_linear_dml_poly_fit(data, opts, seed):
+    """
+    Trains the SparseLinearDMLCateEstimator to predict the heterogenous treatment effects
+
+    Parameters
+    ----------
+    data : tuple
+        relevant data that comes from the data generating process
+    opts : dictionary
+        method-specific options
+    seed : int
+        random seed for random data generation
+
+    Returns
+    -------
+    (X_test, const_marginal_effect), (lb, ub) : tuple of HTE and confidence interval
+    X_test : Test data for use in plotting downstream
+    const_marginal_effect : desired treatment effect
+    (lb, ub) : tuple of confidence intervals for each data point in X_test
+    """
+    X_test, Y, T, X, W = data
+    model_y = opts['model_y']
+    model_t = opts['model_t']
+    featurizer = opts['featurizer']
+    inference = opts['inference']
+
+    est = SparseLinearDMLCateEstimator(model_y=model_y,
+        model_t=model_t,
+        featurizer=featurizer,
+        random_state=seed)
+    est.fit(Y, T, X, W, inference=inference)
+    const_marginal_effect = est.const_marginal_effect(X_test) # same as est.effect(X)
+    lb, ub = est.const_marginal_effect_interval(X_test, alpha=0.05)
+
+    return (X_test, const_marginal_effect), (lb, ub)
+
+def dml_poly_fit(data, opts ,seed):
+    """
+    Trains the DMLCateEstimator to predict the heterogenous treatment effects
+
+    Parameters
+    ----------
+    data : tuple
+        relevant data that comes from the data generating process
+    opts : dictionary
+        method-specific options
+    seed : int
+        random seed for random data generation
+
+    Returns
+    -------
+    (X_test, const_marginal_effect), (lb, ub) : tuple of HTE and confidence interval
+    X_test : Test data for use in plotting downstream
+    const_marginal_effect : desired treatment effect
+    (lb, ub) : tuple of confidence intervals for each data point in X_test
+    """
+    X_test, Y, T, X, W = data
+    model_y = opts['model_y']
+    model_t = opts['model_t']
+    featurizer = opts['featurizer']
+    model_final = opts['model_final']
+    inference = opts['inference']
+
+    est = DMLCateEstimator(model_y=model_y,
+        model_t=model_t,
+        model_final=model_final,
+        featurizer=featurizer,
+        random_state=seed)
+    est.fit(Y, T, X, W, inference=inference)
+    const_marginal_effect = est.const_marginal_effect(X_test) # same as est.effect(X)
+    lb, ub = est.const_marginal_effect_interval(X_test, alpha=0.05)
+
+    return (X_test, const_marginal_effect), (lb, ub)
+
+def forest_dml_fit(data, opts, seed):
+    """
+    Trains the ForestDMLCateEstimator to predict the heterogenous treatment effects
+
+    Parameters
+    ----------
+    data : tuple
+        relevant data that comes from the data generating process
+    opts : dictionary
+        method-specific options
+    seed : int
+        random seed for random data generation
+
+    Returns
+    -------
+    (X_test, const_marginal_effect), (lb, ub) : tuple of HTE and confidence interval
+    X_test : Test data for use in plotting downstream
+    const_marginal_effect : desired treatment effect
+    (lb, ub) : tuple of confidence intervals for each data point in X_test
+    """
+    X_test, Y, T, X, W = data
+    model_y = opts['model_y']
+    model_t = opts['model_t']
+    discrete_treatment = opts['discrete_treatment']
+    n_estimators = opts['n_estimators']
+    subsample_fr = opts['subsample_fr']
+    min_samples_leaf = opts['min_samples_leaf']
+    min_impurity_decrease = opts['min_impurity_decrease']
+    verbose = opts['verbose']
+    min_weight_fraction_leaf = opts['min_weight_fraction_leaf']
+    inference = opts['inference']
+
+    est = ForestDMLCateEstimator(model_y=model_y,
+        model_t=model_t,
+        discrete_treatment=discrete_treatment,
+        n_estimators=n_estimators,
+        subsample_fr=subsample_fr,
+        min_samples_leaf=min_samples_leaf,
+        min_impurity_decrease=min_impurity_decrease,
+        verbose=verbose,
+        min_weight_fraction_leaf=min_weight_fraction_leaf)
+    est.fit(Y, T, X, W, inference=inference)
+    const_marginal_effect = est.const_marginal_effect(X_test) # same as est.effect(X)
+    lb, ub = est.const_marginal_effect_interval(X_test, alpha=0.05)
+
+    return (X_test, const_marginal_effect), (lb, ub)
+
+def main():
+    pass
+
+if __name__=="__main__":
+    main()
diff --git a/montecarlo/dml_te_test/results_seed123.jbl b/montecarlo/dml_te_test/results_seed123.jbl
new file mode 100644
index 000000000..d01932cdd
Binary files /dev/null and b/montecarlo/dml_te_test/results_seed123.jbl differ
diff --git a/montecarlo/example_plots/plot1_conf_length.png b/montecarlo/example_plots/plot1_conf_length.png
new file mode 100644
index 000000000..2b5891f0d
Binary files /dev/null and b/montecarlo/example_plots/plot1_conf_length.png differ
diff --git a/montecarlo/example_plots/plot1_coverage.png b/montecarlo/example_plots/plot1_coverage.png
new file mode 100644
index 000000000..44874bc0b
Binary files /dev/null and b/montecarlo/example_plots/plot1_coverage.png differ
diff --git a/montecarlo/example_plots/plot1_rmse.png b/montecarlo/example_plots/plot1_rmse.png
new file mode 100644
index 000000000..adc164971
Binary files /dev/null and b/montecarlo/example_plots/plot1_rmse.png differ
diff --git a/montecarlo/example_plots/plot1_std.png b/montecarlo/example_plots/plot1_std.png
new file mode 100644
index 000000000..21c3f0f10
Binary files /dev/null and b/montecarlo/example_plots/plot1_std.png differ
diff --git a/montecarlo/example_plots/plot2.png b/montecarlo/example_plots/plot2.png
new file mode 100644
index 000000000..a4e3e2016
Binary files /dev/null and b/montecarlo/example_plots/plot2.png differ
diff --git a/montecarlo/example_plots/plot3.png b/montecarlo/example_plots/plot3.png
new file mode 100644
index 000000000..d8c8bc8ba
Binary files /dev/null and b/montecarlo/example_plots/plot3.png differ
diff --git a/montecarlo/mcpy/__init__.py b/montecarlo/mcpy/__init__.py
new file mode 100644
index 000000000..e69de29bb
diff --git a/montecarlo/mcpy/metrics.py b/montecarlo/mcpy/metrics.py
new file mode 100644
index 000000000..d4f96b0aa
--- /dev/null
+++ b/montecarlo/mcpy/metrics.py
@@ -0,0 +1,63 @@
+import numpy as np
+
+def l1_error(x, y): return np.linalg.norm(x-y, ord=1)
+def l2_error(x, y): return np.linalg.norm(x-y, ord=2)
+def bias(x, y): return x - y
+def raw_estimate(x, y): return x
+def truth(x, y): return y
+def raw_estimate_nonzero_truth(x, y): return x[y>0]
+
+def rmse(param_estimates, true_params):
+    X_test = param_estimates[0][0][0]
+    pred = np.array([param_estimates[i][0][1] for i in range(len(param_estimates))])
+    rmse = np.sqrt(np.mean(np.array(pred-true_params)**2, axis=0))
+    return (X_test, rmse)
+
+def std(param_estimates, true_params):
+    X_test = param_estimates[0][0][0]
+    pred = np.array([param_estimates[i][0][1] for i in range(len(param_estimates))])
+    return (X_test, np.std(pred, axis=0))
+
+def conf_length(param_estimates, true_params):
+    X_test = param_estimates[0][0][0]
+    lb = np.array([param_estimates[i][1][0] for i in range(len(param_estimates))])
+    ub = np.array([param_estimates[i][1][1] for i in range(len(param_estimates))])
+    conf_length = np.mean(np.array(ub-lb), axis=0)
+    return (X_test, conf_length)
+
+def coverage(param_estimates, true_params):
+    X_test = param_estimates[0][0][0]
+    coverage = []
+    for i, param_estimate in enumerate(param_estimates):
+        lb, ub = param_estimate[1]
+        coverage.append((lb <= true_params[i]) & (true_params[i] <= ub))
+    return (X_test, np.mean(coverage, axis=0))
+
+def coverage_band(param_estimates, true_params):
+    coverage_band = []
+    for i, param_estimate in enumerate(param_estimates):
+        lb, ub = param_estimate[1]
+        covered = (lb<=true_params[i]).all() & (true_params[i]<=ub).all()
+        coverage_band.append(covered)
+    return np.array(coverage_band)
+
+def transform_identity(x, dgp, method, metric, config):
+    return x[dgp][method][metric]
+
+def transform_diff(x, dgp, method, metric, config):
+    return x[dgp][method][metric] - x[dgp][config['proposed_method']][metric]
+
+def transform_ratio(x, dgp, method, metric, config):
+    return 100 * (x[dgp][method][metric] - x[dgp][config['proposed_method']][metric]) / x[dgp][method][metric]
+
+def transform_agg_mean(x, dgp, method, metric, config):
+    return np.mean(x[dgp][method][metric], axis=1)
+
+def transform_agg_median(x, dgp, method, metric, config):
+    return np.median(x[dgp][method][metric], axis=1)
+
+def transform_agg_max(x, dgp, method, metric, config):
+    return np.max(x[dgp][method][metric], axis=1)
+
+def transform_diff_positive(x, dgp, method, metric, config):
+    return x[dgp][method][metric] - x[dgp][config['proposed_method']][metric] >= 0
diff --git a/montecarlo/mcpy/monte_carlo.py b/montecarlo/mcpy/monte_carlo.py
new file mode 100644
index 000000000..8d783ab1b
--- /dev/null
+++ b/montecarlo/mcpy/monte_carlo.py
@@ -0,0 +1,238 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License.
+
+import os
+import sys
+import numpy as np
+from joblib import Parallel, delayed
+import joblib
+import argparse
+import importlib
+from itertools import product
+import collections
+
+from copy import deepcopy
+from mcpy.utils import filesafe
+from mcpy import plotting
+
+def check_valid_config(config):
+    """
+    Performs a basic check of the config file, checking if the necessary
+    subsections are present.
+
+    If multiple config files are being made that use the same dgps and/or methods,
+    it may be helpful to tailor the config check to those dgps and methods. That way,
+    one can check that the correct parameters are being provided for those dgps and methods.
+    This is specific to one's implementation, however.
+    """
+    assert 'type' in config, "config dict must specify config type"
+    assert 'dgps' in config, "config dict must contain dgps"
+    assert 'dgp_opts' in config, "config dict must contain dgp_opts"
+    assert 'method_opts' in config, "config dict must contain method_opts"
+    assert 'mc_opts' in config, "config dict must contain mc_opts"
+    assert 'metrics' in config, "config dict must contain metrics"
+    assert 'methods' in config, "config dict must contain methods"
+    assert 'plots' in config, "config dict must contain plots"
+    assert 'single_summary_metrics' in config, "config dict must specify which metrics are plotted in a y-x plot vs. as a single value per dgp and method"
+    assert 'target_dir' in config, "config must contain target_dir"
+    assert 'reload_results' in config, "config must contain reload_results"
+    assert 'n_experiments' in config['mc_opts'], "config[mc_opts] must contain n_experiments"
+    assert 'seed' in config['mc_opts'], "config[mc_opts] must contain seed"
+
+class MonteCarlo:
+    """
+    This class contains methods to run (multiple) monte carlo experiments
+
+    Experiments are constructed from a config file, which mainly consists of
+    references to the implementations of four different kinds of items, in
+    addition to various parameters for the experiment. See the README for
+    a descriptoin of the config file, or look at an example in the configs directory.
+    The four main items are:
+     - data generating processes (dgps): functions that generate data according to
+     some assumed underlying model
+     - methods: functions that take in data and produce other data. In our case,
+     they train on data produced by DGPs and then produce counterfactual estimates
+     - metrics: functions that take in the results of estimators and calculate metrics
+     - plots: functions that take in the metric results, etc. and generate plots
+    """
+
+    def __init__(self, config):
+        self.config = config
+        check_valid_config(self.config)
+        # these param strings are for properly naming results saved to disk
+        config['param_str'] = '_'.join(['{}_{}'.format(filesafe(k), v) for k,v in self.config['mc_opts'].items()])
+        config['param_str'] += '_' + '_'.join(['{}_{}'.format(filesafe(k), v) for k,v in self.config['dgp_opts'].items()])
+        config['param_str'] += '_' + '_'.join(['{}_{}'.format(filesafe(k), v) for k,v in self.config['method_opts'].items()])
+
+    def experiment(self, instance_params, seed):
+        """
+        Given instance parameters to pass on to the data generating processes,
+        runs an experiment on a single randomly generated instance of data and returns the
+        parameter estimates for each method and the evaluated metrics for each method.
+
+        Parameters
+        ----------
+        instance_params : dictionary
+            instance paramaters that DGP functions may use
+        seed : int
+            random seed for random data generation
+
+        Returns
+        -------
+        experiment_results : dictionary
+            results of the experiment, depending on what the methods return.
+            These are stored by dgp_name and then by method_name.
+        true_params : dictionary
+            true parameters of the DGP, indexed by dgp_name, used for metrics
+            calculation downstream
+        """
+        np.random.seed(seed)
+
+        experiment_results = {}
+        true_params = {}
+        for dgp_name, dgp_fn in self.config['dgps'].items():
+            data, true_param = dgp_fn(self.config['dgp_opts'][dgp_name], instance_params[dgp_name], seed)
+            true_params[dgp_name] = true_param
+            experiment_results[dgp_name] = {}
+
+            for method_name, method in self.config['methods'].items():
+                experiment_results[dgp_name][method_name] = method(data, self.config['method_opts'][method_name], seed)
+
+        return experiment_results, true_params
+
+    def run(self):
+        """
+        Runs multiple experiments in parallel on randomly generated instances and samples and returns
+        the results for each method and the evaluated metrics for each method across all
+        experiments.
+
+        Returns
+        -------
+        simulation_results : dictionary
+            dictionary indexed by [dgp_name][method_name] for individual experiment results
+        metric_results : dictionary
+            dictionary indexed by [dgp_name][method_name][metric_name]
+        true_param : dictinoary
+            dictionary indexed by [dgp_name]
+        """
+        random_seed = self.config['mc_opts']['seed']
+
+        if not os.path.exists(self.config['target_dir']):
+            os.makedirs(self.config['target_dir'])
+
+        instance_params = {}
+        for dgp_name in self.config['dgps']:
+            instance_params[dgp_name] = self.config['dgp_instance_fns'][dgp_name](self.config['dgp_opts'][dgp_name], random_seed)
+
+        # results_file = os.path.join(self.config['target_dir'], 'results_{}.jbl'.format(self.config['param_str']))
+        results_file = os.path.join(self.config['target_dir'], 'results_seed{}.jbl'.format(random_seed))
+        if self.config['reload_results'] and os.path.exists(results_file):
+            results = joblib.load(results_file)
+        else:
+            results = Parallel(n_jobs=-1, verbose=1)(
+                    delayed(self.experiment)(instance_params, random_seed + exp_id)
+                    for exp_id in range(self.config['mc_opts']['n_experiments']))
+            joblib.dump(results, results_file)
+
+        simulation_results = {} # note that simulation_results is a vector of individual experiment_results. from experiment()
+        metric_results = {}
+        true_params = {}
+        for dgp_name in self.config['dgps'].keys():
+            simulation_results[dgp_name] = {}
+            metric_results[dgp_name] = {}
+            for method_name in self.config['methods'].keys():
+                simulation_results[dgp_name][method_name] = [results[i][0][dgp_name][method_name] for i in range(self.config['mc_opts']['n_experiments'])]
+                true_params[dgp_name] = [results[i][1][dgp_name] for i in range(self.config['mc_opts']['n_experiments'])]
+                metric_results[dgp_name][method_name] = {}
+                for metric_name, metric_fn in self.config['metrics'].items():
+                # for metric_name, metric_fn in self.config['metrics'][method_name].items(): # for method specific parameters
+                    metric_results[dgp_name][method_name][metric_name] = metric_fn(simulation_results[dgp_name][method_name], true_params[dgp_name])
+
+        for plot_name, plot_fn in self.config['plots'].items():
+        # for plot_name, plot_fn in self.config['plots'][method_name].items(): # for method specific plots
+            if isinstance(plot_fn, dict):
+                plotting.instance_plot(plot_name, simulation_results, metric_results, self.config, plot_fn)
+            else:
+                plot_fn(plot_name, simulation_results, metric_results, true_params, self.config)
+
+        return simulation_results, metric_results, true_params
+
+class MonteCarloSweep:
+    """
+    This class contains methods to run sets of multiple monte carlo experiments
+    where each set of experiments has different parameters (for the dgps and methods, etc.).
+    This enables sweeping through parameter values to generate results for each permutation
+    of parameters. For example, running a simulation when the number of samples a specific DGP
+    generates is 100, 1000, or 10000.
+    """
+
+    def __init__(self, config):
+        self.config = config
+        check_valid_config(self.config)
+        config['param_str'] = '_'.join(['{}_{}'.format(filesafe(k), self.stringify_param(v)) for k,v in self.config['mc_opts'].items()])
+        config['param_str'] += '_' + '_'.join(['{}_{}'.format(filesafe(k), self.stringify_param(v)) for k,v in self.config['dgp_opts'].items()])
+        config['param_str'] += '_' + '_'.join(['{}_{}'.format(filesafe(k), self.stringify_param(v)) for k,v in self.config['method_opts'].items()])
+
+    def stringify_param(self, param):
+        """
+        Parameters
+        ----------
+        param : list
+            list denoting the various values a parameter should take
+
+        Returns
+        -------
+        A string representation of the range of the values that parameter will take
+        """
+        if hasattr(param, "__len__"):
+            return '{}_to_{}'.format(np.min(param), np.max(param))
+        else:
+            return param
+
+    def run(self):
+        """
+        Runs many monte carlo simulations for all the permutations of parameters
+        specified in the config file.
+
+        Returns
+        -------
+        sweep_keys : list
+            list of all the permutations of parameters for each dgp
+        sweep_sim_results : list
+            list of simulation results for each permutation of parameters for each dgp
+        sweep_metrics : list
+            list of metric results for each permutation of parameters for each dgp
+        sweep_true_params : list
+            list of true parameters for each permutation of parameters for each dgp 
+        """
+        # currently duplicates computation for the dgps because all only one dgp param changed each config
+        # need to make it so that every inst_config is different for each dgp
+        for dgp_name in self.config['dgp_opts'].keys():
+            dgp_sweep_params = []
+            dgp_sweep_param_vals = []
+            for dgp_key, dgp_val  in self.config['dgp_opts'][dgp_name].items():
+                if hasattr(dgp_val, "__len__"):
+                    dgp_sweep_params.append(dgp_key)
+                    dgp_sweep_param_vals.append(dgp_val)
+            sweep_keys = []
+            sweep_sim_results = []
+            sweep_metrics = []
+            sweep_true_params = []
+            inst_config = deepcopy(self.config)
+            for vec in product(*dgp_sweep_param_vals):
+                setting = list(zip(dgp_sweep_params, vec))
+                for k,v in setting:
+                    inst_config['dgp_opts'][dgp_name][k] = v
+                simulation_results, metrics, true_params = MonteCarlo(inst_config).run()
+                sweep_keys.append(setting)
+                sweep_sim_results.append(simulation_results)
+                sweep_metrics.append(metrics)
+                sweep_true_params.append(true_params)
+
+        for plot_name, plot_fn in self.config['sweep_plots'].items():
+            if isinstance(plot_fn, dict):
+                plotting.sweep_plot(plot_key, sweep_keys, sweep_sim_results, sweep_metrics, self.config, plot_fn)
+            else:
+                plot_fn(plot_name, sweep_keys, sweep_sim_results, sweep_metrics, sweep_true_params, self.config)
+
+        return sweep_keys, sweep_sim_results, sweep_metrics, sweep_true_params
diff --git a/montecarlo/mcpy/plotting.py b/montecarlo/mcpy/plotting.py
new file mode 100644
index 000000000..4f97bd07b
--- /dev/null
+++ b/montecarlo/mcpy/plotting.py
@@ -0,0 +1,78 @@
+import os
+import numpy as np
+import matplotlib
+# matplotlib.use('Agg')
+import matplotlib.pyplot as plt
+from matplotlib.mlab import griddata
+plt.style.use('ggplot')
+from mcpy.utils import filesafe
+import mcpy.metrics
+import itertools
+
+# example-dgp-specific
+def plot_sweep(plot_name, sweep_keys, sweep_params, sweep_metrics, sweep_true_params, config):
+    pass
+
+# example-dgp-specific
+def plot_metrics(plot_name, experiment_results, metric_results, true_params, config):
+    """
+    Plots all metrics that are listed in the metrics section of a config file
+    which are not single summary statistics. Thus, this plots metrics that are a
+    function of the points X. These metrics are all based on Theta(X).
+    """
+    for dgp_name in config['dgps'].keys(): # just one right now
+        for metric_name in config['metrics'].keys():
+            if metric_name not in config['single_summary_metrics']:
+                for method_name in config['methods'].keys():
+                    x, y = metric_results[dgp_name][method_name][metric_name]
+                    plt.plot(x, y, label=method_name)
+                plt.xlabel("X_test")
+                plt.ylabel(metric_name)
+                plt.legend()
+                plt.savefig(plot_name + "_" + metric_name)
+                plt.show()
+
+# example-dgp-specific
+def plot_visualization(plot_name, experiment_results, metric_results, true_params, config):
+    """
+    Plots the results of each method for each dgp vs. the true effect.
+    """
+    X_test = []
+    for dgp_name in config['dgps'].keys(): # just one right now
+        for method_name in config['methods'].keys():
+            X_test = experiment_results[dgp_name][method_name][0][0][0]
+            pred = np.array([experiment_results[dgp_name][method_name][i][0][1] for i in range(len(experiment_results[dgp_name][method_name]))])
+            mean = np.mean(pred, axis=0)
+            plt.plot(X_test, mean, label=method_name)
+            plt.xlabel("X_test")
+            plt.ylabel("Treatment Effect")
+            lb = np.array([experiment_results[dgp_name][method_name][i][1][0] for i in range(len(experiment_results[dgp_name][method_name]))])
+            ub = np.array([experiment_results[dgp_name][method_name][i][1][1] for i in range(len(experiment_results[dgp_name][method_name]))])
+            lb_ = np.min(lb, axis=0)
+            ub_ = np.max(ub, axis=0)
+            plt.fill_between(X_test.reshape(100,), lb_, ub_, alpha=0.25)
+
+        true = true_params[dgp_name][0]
+        plt.plot(X_test, true, label='true effect')
+        plt.legend()
+        plt.savefig(plot_name)
+        plt.show()
+
+# example-dgp-specific
+def plot_violin(plot_name, experiment_results, metric_results, true_params, config):
+    """
+    Plots all metrics that are single summary statistics, for each method. These
+    are single numbers produced for each method for each experiment. They are not a function
+    of X.
+    """
+    for dgp_name, dgp_fn in metric_results.items():
+        n_methods = len(list(dgp_fn.keys()))
+        for metric_name in next(iter(dgp_fn.values())).keys():
+            if metric_name in config['single_summary_metrics']:
+                plt.figure(figsize=(1.5 * n_methods, 2.5))
+                plt.violinplot([dgp_fn[method_name][metric_name] for method_name in dgp_fn.keys()], showmedians=True)
+                plt.xticks(np.arange(1, n_methods + 1), list(dgp_fn.keys()))
+                plt.ylabel(metric_name)
+                plt.tight_layout()
+                plt.savefig(plot_name)
+                plt.show()
diff --git a/montecarlo/mcpy/utils.py b/montecarlo/mcpy/utils.py
new file mode 100644
index 000000000..9847ea0f0
--- /dev/null
+++ b/montecarlo/mcpy/utils.py
@@ -0,0 +1,7 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License.
+
+import numpy as np
+
+def filesafe(str):
+    return "".join([c for c in str if c.isalpha() or c.isdigit() or c==' ']).rstrip().replace(' ', '_')
diff --git a/montecarlo/run_mc_from_config.py b/montecarlo/run_mc_from_config.py
new file mode 100644
index 000000000..1c9756ce6
--- /dev/null
+++ b/montecarlo/run_mc_from_config.py
@@ -0,0 +1,31 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License.
+
+import sys
+import argparse
+from mcpy.monte_carlo import MonteCarlo, MonteCarloSweep
+import importlib
+
+def monte_carlo_main():
+    """
+    Entry point to run monte carlo simulations with specifications from a config file
+    run this file with command line arguments '--config [config_file]'
+    """
+    parser = argparse.ArgumentParser(description='Process a config file for the monte carlo simulation.')
+    parser.add_argument('--config', type=str, help='config file')
+    args = parser.parse_args(sys.argv[1:])
+    config = importlib.import_module(args.config, __name__)
+
+    print("Running monte carlo simulation from {}.".format(args.config))
+    # one may want to define new config types
+    # the sweep over the parameters could be done differently depending on type
+    # imagine, a different MonteCarloSweep function
+    # or where MonteCarloSweep sweeps over the parameters differently
+    # depending on the type specified
+    if config.CONFIG['type'] == 'single_parameter':
+        MonteCarlo(config.CONFIG).run()
+    elif config.CONFIG['type'] == 'sweep_parameter':
+        MonteCarloSweep(config.CONFIG).run()
+
+if __name__=="__main__":
+    monte_carlo_main()