.. _model_specs: ******************** Model specifications ******************** The directory *src.model_specs* contains `JSON `_ files with model specifications. They are used across different parts of the model to specify the simulations/calculations or make the plotting uniform across different modules. I decided to split those specification in a lot of different files to make it easier to change only certain parts of the specifications without having to rerun the whole code in waf. Overview for JSON files ======================= All JSON files are used to define a dictionary in python. Below I will give a short descriptions to all JSON files (also referred to as dictionary) and its keys. The default values are all inline with the descriptions in the final term paper and hence omitted here. There is no JSON, which describes the Data Generating Processes, as those are fixed anyways in the *DataSimulation* class. Also, the order of them is fixed due to the structure of the paper. boston_settings.json ******************** The dictionary defines the simulation set-up that is specific to the boston simulation. Keys ---- ratio_test: float Ratio for the test sample ratio_train: float Counterpart to *ratio_test* random_seed_split: int Defines the RandomState for the test_train_split random_seed_fit: int Random seed for the fitting procedure convergence_settings.json ************************* The dictionary defines the simulation set-up that is specific to the convergence of the Bagging Algorithm. Keys ---- max_bootstrap: int Maximum number of bootstraps in the range to be considered min_bootstrap: int Minimum number of bootstraps in the range to be considered steps_bootstrap: int Steps in the range between *min_bootstrap* and *max_bootstrap* converged_bootstrap: int A large value of bootstrap iterations to visualize the convergence finite_sample_settings.json *************************** The dictionary that defines the simulation set-up for the finite sample case. Keys ---- n_repeat: int Number of Monte Carlo repetitions n_list: list List with the sample sizes to be considered mu: int, float True mean of the population sigma: int, float Standard deviation b_iterations: int Number of bootstrap iterations x_gridpoints: int Number of gridpoints x_min: int Minimum gridpoint x_max: int Maximal gridpoint random_seed: int Random seed for the simulation general_settings.json ********************* The dictionary is shared across various simulations and defines the overall simulation set-up. Keys ---- n_repeat: int Number of Monte Carlo repetitions n_test_train: list List with the test and train size noise: int, float Standard deviation of the error term for the data generating process b_iterations: int Number of bootstrap iterations min_split_tree: int Governs the tree depth. Lower values imply more complex Regression Trees random_seeds: list List of random seeds used. Note: I don't reseed but define different RandomState instances with those. bagging_ratio: constant at 1 Subsampling ratio for bagging. Do not change! normal_splits_settings.json *************************** The dictionary defines the calculation set-up that is specific to the stump predictor simulation. Keys ---- c_gridpoints: int Number of gridpoints for c c_min: int Minimum gridpoint c_max: int Maximal gridpoint a_array: dictionary Consists of keys that define the subsampling ratios I want to consider. The value of the first key has to be equal to 1. The other key values are defined as lists, where list[0] = numerators and list[1] = denominator of the subsampling fraction. gamma: float Rate of convergence of the estimator settings_plotting.json ********************** The dictionary contains all plotting specifications that are shared across various modules. Keys ---- style: string Matplotlib stlye that is used for all plots figsize: list List that defines the figure sizes figsize_theory: list List that defines the figure sizes in the theory part colors: dictionary Dictionary for uniform colors across figures ls: dictionary Dictionary for uniform line style across figures subagging_settings.json *********************** The dictionary defines the simulation set-up that is specific to the subagging simulation. Keys ---- n_ratios: int Number of subsampling ratios to be considered max_ratio: int, float Maximal subsampling ratio min_ratio: int, float Minimal subsampling ratio toy_example_settings.json ************************* The dictionary defines the calculation set-up that is specific to the introductory simulation. Keys ---- c_gridpoints: int Number of gridpoints c_min: int, float Minimal gridpoint c_max: int, float Maximal gridpoint tree_depth_settings.json ************************ The dictionary defines the simulation set-up that is specific to the tree depth simulation. Keys ---- min_split: int Minimal split minimum for terminal nodes max_split: int Maximal split minimum for terminal nodes steps_split: int Steps within the range