Model specifications¶

The directory src.model_specs contains JSON files with model specifications. They are used across different parts of the model to specify the simulations/calculations or make the plotting uniform across different modules. I decided to split those specification in a lot of different files to make it easier to change only certain parts of the specifications without having to rerun the whole code in waf.

Overview for JSON files¶

All JSON files are used to define a dictionary in python. Below I will give a short descriptions to all JSON files (also referred to as dictionary) and its keys. The default values are all inline with the descriptions in the final term paper and hence omitted here. There is no JSON, which describes the Data Generating Processes, as those are fixed anyways in the DataSimulation class. Also, the order of them is fixed due to the structure of the paper.

boston_settings.json¶

The dictionary defines the simulation set-up that is specific to the boston simulation.

Keys¶

ratio_test: float: Ratio for the test sample
ratio_train: float: Counterpart to ratio_test
random_seed_split: int: Defines the RandomState for the test_train_split
random_seed_fit: int: Random seed for the fitting procedure

convergence_settings.json¶

The dictionary defines the simulation set-up that is specific to the convergence of the Bagging Algorithm.

Keys¶

max_bootstrap: int: Maximum number of bootstraps in the range to be considered
min_bootstrap: int: Minimum number of bootstraps in the range to be considered
steps_bootstrap: int: Steps in the range between min_bootstrap and max_bootstrap
converged_bootstrap: int: A large value of bootstrap iterations to visualize the convergence

finite_sample_settings.json¶

The dictionary that defines the simulation set-up for the finite sample case.

Keys¶

n_repeat: int: Number of Monte Carlo repetitions
n_list: list: List with the sample sizes to be considered
mu: int, float: True mean of the population
sigma: int, float: Standard deviation
b_iterations: int: Number of bootstrap iterations
x_gridpoints: int: Number of gridpoints
x_min: int: Minimum gridpoint
x_max: int: Maximal gridpoint
random_seed: int: Random seed for the simulation

general_settings.json¶

The dictionary is shared across various simulations and defines the overall simulation set-up.

Keys¶

n_repeat: int: Number of Monte Carlo repetitions
n_test_train: list: List with the test and train size
noise: int, float: Standard deviation of the error term for the data generating process
b_iterations: int: Number of bootstrap iterations
min_split_tree: int: Governs the tree depth. Lower values imply more complex Regression Trees
random_seeds: list: List of random seeds used. Note: I don’t reseed but define different RandomState instances with those.
bagging_ratio: constant at 1: Subsampling ratio for bagging. Do not change!

normal_splits_settings.json¶

The dictionary defines the calculation set-up that is specific to the stump predictor simulation.

Keys¶

c_gridpoints: int: Number of gridpoints for c
c_min: int: Minimum gridpoint
c_max: int: Maximal gridpoint
a_array: dictionary: Consists of keys that define the subsampling ratios I want to consider. The value of the first key has to be equal to 1. The other key values are defined as lists, where list[0] = numerators and list[1] = denominator of the subsampling fraction.
gamma: float: Rate of convergence of the estimator

settings_plotting.json¶

The dictionary contains all plotting specifications that are shared across various modules.

Keys¶

style: string: Matplotlib stlye that is used for all plots
figsize: list: List that defines the figure sizes
figsize_theory: list: List that defines the figure sizes in the theory part
colors: dictionary: Dictionary for uniform colors across figures
ls: dictionary: Dictionary for uniform line style across figures

subagging_settings.json¶

The dictionary defines the simulation set-up that is specific to the subagging simulation.

Keys¶

n_ratios: int: Number of subsampling ratios to be considered
max_ratio: int, float: Maximal subsampling ratio
min_ratio: int, float: Minimal subsampling ratio

toy_example_settings.json¶

The dictionary defines the calculation set-up that is specific to the introductory simulation.

Keys¶

c_gridpoints: int: Number of gridpoints
c_min: int, float: Minimal gridpoint
c_max: int, float: Maximal gridpoint

tree_depth_settings.json¶

The dictionary defines the simulation set-up that is specific to the tree depth simulation.

Keys¶

min_split: int: Minimal split minimum for terminal nodes
max_split: int: Maximal split minimum for terminal nodes
steps_split: int: Steps within the range

Model specifications¶

Overview for JSON files¶

boston_settings.json¶

Keys¶

convergence_settings.json¶

Keys¶

finite_sample_settings.json¶

Keys¶

general_settings.json¶

Keys¶

normal_splits_settings.json¶

Keys¶

settings_plotting.json¶

Keys¶

subagging_settings.json¶

Keys¶

toy_example_settings.json¶

Keys¶

tree_depth_settings.json¶

Keys¶

Table Of Contents

Previous topic

Next topic

This Page