.. _model_specs:

********************
Model specifications
********************

The directory *src.model_specs* contains `JSON <http://www.json.org/>`_ files with model specifications.
They are used across different parts of the model to specify the simulations/calculations or make the
plotting uniform across different modules. I decided to split those specification in a lot of
different files to make it easier to change only certain parts of the specifications
without having to rerun the whole code in waf.

Overview for JSON files
=======================
All JSON files are used to define a dictionary in python. Below I will give
a short descriptions to all JSON files (also referred to as dictionary) and its
keys.
The default values are all inline with the descriptions in the final term paper
and hence omitted here.
There is no JSON, which describes the Data Generating Processes, as those are
fixed anyways in the *DataSimulation* class. Also, the order of them is fixed due to
the structure of the paper.

boston_settings.json
********************
The dictionary defines the simulation set-up that is specific to the boston simulation.

Keys
----
ratio_test: float
  Ratio for the test sample
ratio_train: float
  Counterpart to *ratio_test*
random_seed_split: int
  Defines the RandomState for the test_train_split
random_seed_fit: int
  Random seed for the fitting procedure

convergence_settings.json
*************************
The dictionary defines the simulation set-up that is specific to the convergence of the Bagging Algorithm.

Keys
----
max_bootstrap: int
  Maximum number of bootstraps in the range to be considered
min_bootstrap: int
  Minimum number of bootstraps in the range to be considered
steps_bootstrap: int
  Steps in the range between *min_bootstrap* and *max_bootstrap*
converged_bootstrap: int
  A large value of bootstrap iterations to visualize the convergence

finite_sample_settings.json
***************************
The dictionary that defines the simulation set-up for the finite sample case.

Keys
----
n_repeat: int
  Number of Monte Carlo repetitions
n_list: list
  List with the sample sizes to be considered
mu: int, float
  True mean of the population
sigma: int, float
  Standard deviation
b_iterations: int
  Number of bootstrap iterations
x_gridpoints: int
  Number of gridpoints
x_min: int
  Minimum gridpoint
x_max: int
  Maximal gridpoint
random_seed: int
 Random seed for the simulation

general_settings.json
*********************
The dictionary is shared across various simulations and defines the overall simulation set-up.

Keys
----
n_repeat: int
  Number of Monte Carlo repetitions
n_test_train: list
  List with the test and train size
noise: int, float
  Standard deviation of the error term for the data generating process
b_iterations: int
  Number of bootstrap iterations
min_split_tree: int
  Governs the tree depth. Lower values imply more complex Regression Trees
random_seeds: list
  List of random seeds used. Note: I don't reseed but define different RandomState instances with those.
bagging_ratio: constant at 1
  Subsampling ratio for bagging. Do not change!


normal_splits_settings.json
***************************
The dictionary defines the calculation set-up that is specific to the stump predictor simulation.

Keys
----
c_gridpoints: int
  Number of gridpoints for c
c_min: int
  Minimum gridpoint
c_max: int
  Maximal gridpoint
a_array: dictionary
  Consists of keys that define the subsampling ratios I want to consider.
  The value of the first key has to be equal to 1.
  The other key values are defined as lists, where list[0] = numerators and
  list[1] = denominator of the subsampling fraction.
gamma: float
  Rate of convergence of the estimator


settings_plotting.json
**********************
The dictionary contains all plotting specifications that are shared across various modules.

Keys
----
style: string
  Matplotlib stlye that is used for all plots
figsize: list
  List that defines the figure sizes
figsize_theory: list
  List that defines the figure sizes in the theory part
colors: dictionary
  Dictionary for uniform colors across figures
ls: dictionary
  Dictionary for uniform line style across figures

subagging_settings.json
***********************
The dictionary defines the simulation set-up that is specific to the subagging simulation.

Keys
----
n_ratios: int
  Number of subsampling ratios to be considered
max_ratio: int, float
  Maximal subsampling ratio
min_ratio: int, float
  Minimal subsampling ratio

toy_example_settings.json
*************************
The dictionary defines the calculation set-up that is specific to the introductory simulation.

Keys
----
c_gridpoints: int
  Number of gridpoints
c_min: int, float
    Minimal gridpoint
c_max: int, float
  Maximal gridpoint

tree_depth_settings.json
************************
The dictionary defines the simulation set-up that is specific to the tree depth simulation.

Keys
----
min_split: int
  Minimal split minimum for terminal nodes
max_split: int
  Maximal split minimum for terminal nodes
steps_split: int
  Steps within the range