Auto Tune Hyper Parameters

31.05.2020/ Comments off

Auto Tune Hyper Parameters Rating: 6,0/10 2095 votes

In machine learning, a hyperparameter is a parameter whose value is set before the learning process begins. By contrast, the values of other parameters are derived via training.

Hyperparameters Machine Learning
Autotune Download

Hyperparameters can be classified as model hyperparameters, that cannot be inferred while fitting the machine to the training set because they refer to the model selection task, or algorithm hyperparameters, that in principle have no influence on the performance of the model but affect the speed and quality of the learning process. An example of a model hyperparameter is the topology and size of a neural network. Examples of algorithm hyperparameters are learning rate and mini-batch size.^{[clarification needed]}

A hyperparameter is a model parameter (i.e., component) that defines a part of the machine learning model’s architecture, and influences the values of other parameters (e.g., coefficients or weights).Hyperparameters are set before training the model, where parameters are learned for the model during training. Hyperparameter selection and tuning can feel like somewhat of a mystery,. Oct 31, 2017 Previously,it was common practice in hyper-parameter tuning that if there are two hyper-parameters h1 and h2,then a grid(2D matrix) with h1 on X-axis and h2 on y. In this example, we tune the optimization algorithm used to train the network, each with default parameters. This is an odd example, because often you will choose one approach a priori and instead focus on tuning its parameters on your problem (e.g. See the next example).

Different model training algorithms require different hyperparameters, some simple algorithms (such as ordinary least squares regression) require none. Given these hyperparameters, the training algorithm learns the parameters from the data. For instance, LASSO is an algorithm that adds a regularization hyperparameter to ordinary least squares regression, which has to be set before estimating the parameters through the training algorithm.

Grid Search: Using knowledge you have about the problem identify ranges for the hyperparameters. Then select several points from those ranges, usually uniformly distributed. Then select several points from those ranges, usually uniformly distributed. Sep 16, 2019 They enjoy high availability, auto-scalability, model rolling update and many more advantages over a single web server. A framework like Seldon or Kubeflow fairing eases the job of machine. Jan 10, 2018 Hyperparameter tuning relies more on experimental results than theory, and thus the best method to determine the optimal settings is to try many different combinations evaluate the performance of each model. However, evaluating each model only on the training set can lead to one of the most fundamental problems in machine learning: overfitting.

Considerations[edit]

The time required to train and test a model can depend upon the choice of its hyperparameters.^[1] A hyperparameter is usually of continuous or integer type, leading to mixed-type optimization problems.^[1] The existence of some hyperparameters is conditional upon the value of others, e.g. the size of each hidden layer in a neural network can be conditional upon the number of layers.^[1]

Difficulty learnable parameters[edit]

Usually, but not always, hyperparameters cannot be learned using well known gradient based methods (such as gradient descent, LBFGS) - which are commonly employed to learn parameters. These hyperparameters are those parameters describing a model representation that cannot be learned by common optimization methods but nonetheless affect the loss function. An example would be the tolerance hyperparameter for errors in support vector machines.

Untrainable parameters[edit]

Sometimes, hyperparameters cannot be learned from the training data because they aggressively increase the capacity of a model and can push the loss function to a bad minimum - overfitting to, and picking up noise, in the data - as opposed to correctly mapping the richness of the structure in the data. For example - if we treat the degree of a polynomial equation fitting a regression model as a trainable parameter - this would just raise the degree up until the model perfectly fit the data, giving small training error - but bad generalization performance.

Tunability[edit]

Most performance variation can be attributed to just a few hyperparameters.^[2]^[1]^[3] The tunability of an algorithm, hyperparameter, or interacting hyperparameters is a measure of how much performance can be gained by tuning it.^[4] For an LSTM, while the learning rate followed by the network size are its most crucial hyperparameters,^[5] batching and momentum have no significant effect on its performance.^[6]

Although some research has advocated the use of mini-batch sizes in the thousands, other work has found the best performance with mini-batch sizes between 2 and 32.^[7]

Robustness[edit]

An inherent stochasticity in learning directly implies that the empirical hyperparameter performance is not necessarily its true performance.^[1] Methods that are not robust to simple changes in hyperparameters, random seeds, or even different implementations of the same algorithm cannot be integrated into mission critical control systems without significant simplification and robustification.^[8]

Reinforcement learning algorithms, in particular, require measuring their performance over a large number of random seeds, and also measuring their sensitivity to choices of hyperparameters.^[8] Their evaluation with a small number of random seeds does not capture performance adequately due to high variance.^[8] Some reinforcement learning methods, e.g. DDPG (Deep Deterministic Policy Gradient), are more sensitive to hyperparameter choices than others.^[8]

Optimization[edit]

Hyperparameter optimization finds a tuple of hyperparameters that yields an optimal model which minimizes a predefined loss function on given test data.^[1] The objective function takes a tuple of hyperparameters and returns the associated loss.^[1]

Reproducibility[edit]

Apart from tuning hyperparameters, machine learning involves storing and organizing the parameters and results, and making sure they are reproducible.^[9] In the absence of a robust infrastructure for this purpose, research code often evolves quickly and compromises essential aspects like bookkeeping and reproducibility.^[10] Online collaboration platforms for machine learning go further by allowing scientists to automatically share, organize and discuss experiments, data, and algorithms.^[11]

A number of relevant services and open source software exist:

Services[edit]

Name	Interfaces
Comet.ml^[12]	Python^[13]
OpenML^[14]^[11]^[15]^[16]	REST, Python, Java, R^[17]
Weights & Biases^[18]	Python^[19]

Software[edit]

Name	Interfaces	Store
OpenML Docker^[14]^[11]^[15]^[16]	REST, Python, Java, R^[17]	MySQL
sacred^[9]^[10]	Python^[20]	file, MongoDB, TinyDB, SQL

References[edit]

^ ^a^b^c^d^e^f^g'Claesen, Marc, and Bart De Moor. 'Hyperparameter Search in Machine Learning.' arXiv preprint arXiv:1502.02127 (2015)'. arXiv:1502.02127. Bibcode:2015arXiv150202127C.
^Leyton-Brown, Kevin; Hoos, Holger; Hutter, Frank (January 27, 2014). 'An Efficient Approach for Assessing Hyperparameter Importance': 754–762 – via proceedings.mlr.press.Cite journal requires journal= (help)
^'van Rijn, Jan N., and Frank Hutter. 'Hyperparameter Importance Across Datasets.' arXiv preprint arXiv:1710.04725 (2017)'. arXiv:1710.04725. Bibcode:2017arXiv171004725V.
^'Probst, Philipp, Bernd Bischl, and Anne-Laure Boulesteix. 'Tunability: Importance of Hyperparameters of Machine Learning Algorithms.' arXiv preprint arXiv:1802.09596 (2018)'. arXiv:1802.09596. Bibcode:2018arXiv180209596P.
^Greff, K.; Srivastava, R. K.; Koutník, J.; Steunebrink, B. R.; Schmidhuber, J. (October 23, 2017). 'LSTM: A Search Space Odyssey'. IEEE Transactions on Neural Networks and Learning Systems. 28 (10): 2222–2232. arXiv:1503.04069. doi:10.1109/TNNLS.2016.2582924. PMID27411231.
^'Breuel, Thomas M. 'Benchmarking of LSTM networks.' arXiv preprint arXiv:1508.02774 (2015)'. arXiv:1508.02774. Bibcode:2015arXiv150802774B.
^'Revisiting Small Batch Training for Deep Neural Networks (2018)'. arXiv:1804.07612. Bibcode:2018arXiv180407612M.
^ ^a^b^c^d'Mania, Horia, Aurelia Guy, and Benjamin Recht. 'Simple random search provides a competitive approach to reinforcement learning.' arXiv preprint arXiv:1803.07055 (2018)'. arXiv:1803.07055. Bibcode:2018arXiv180307055M.
^ ^a^b'Greff, Klaus, and Jürgen Schmidhuber. 'Introducing Sacred: A Tool to Facilitate Reproducible Research.''(PDF). 2015.
^ ^a^b'Greff, Klaus, et al. 'The Sacred Infrastructure for Computational Research.''(PDF). 2017.
^ ^a^b^c'Vanschoren, Joaquin, et al. 'OpenML: networked science in machine learning.' arXiv preprint arXiv:1407.7722 (2014)'. arXiv:1407.7722. Bibcode:2014arXiv1407.7722V.
^'Comet.ml – Machine Learning Experiment Management'.
^Inc, Comet ML. 'comet-ml: Supercharging Machine Learning' – via PyPI.
^ ^a^bVan Rijn, Jan N.; Bischl, Bernd; Torgo, Luis; Gao, Bo; Umaashankar, Venkatesh; Fischer, Simon; Winter, Patrick; Wiswedel, Bernd; Berthold, Michael R.; Vanschoren, Joaquin (2013). 'OpenML: A Collaborative Science Platform'. Van Rijn, Jan N., et al. 'OpenML: A collaborative science platform.' Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, Berlin, Heidelberg, 2013. Lecture Notes in Computer Science. 7908. pp. 645–649. doi:10.1007/978-3-642-40994-3_46. ISBN978-3-642-38708-1.
^ ^a^b'Vanschoren, Joaquin, Jan N. van Rijn, and Bernd Bischl. 'Taking machine learning research online with OpenML.' Proceedings of the 4th International Conference on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications-Volume 41. JMLR. org, 2015'(PDF).
^ ^a^b'van Rijn, J. N. Massively collaborative machine learning. Diss. 2016'. 2016-12-19.
^ ^a^b'OpenML'. GitHub.
^'Weights & Biases for Experiment Tracking and Collaboration'.
^'Monitor your Machine Learning models with PyEnv'.
^Greff, Klaus (2020-01-03). 'sacred: Facilitates automated and reproducible experimental research' – via PyPI.

Retrieved from 'https://en.wikipedia.org/w/index.php?title=Hyperparameter_(machine_learning)&oldid=949510909'

-->

APPLIES TO: Basic edition Enterprise edition (Upgrade to Enterprise edition)

Efficiently tune hyperparameters for your model using Azure Machine Learning. Hyperparameter tuning includes the following steps:

Define the parameter search space
Specify a primary metric to optimize
Specify early termination criteria for poorly performing runs
Allocate resources for hyperparameter tuning
Launch an experiment with the above configuration
Visualize the training runs
Select the best performing configuration for your model

What are hyperparameters?

Hyperparameters are adjustable parameters you choose to train a model that govern the training process itself. For example, to train a deep neural network, you decide the number of hidden layers in the network and the number of nodes in each layer prior to training the model. These values usually stay constant during the training process.

In deep learning / machine learning scenarios, model performance depends heavily on the hyperparameter values selected. The goal of hyperparameter exploration is to search across various hyperparameter configurations to find a configuration that results in the best performance. Typically, the hyperparameter exploration process is painstakingly manual, given that the search space is vast and evaluation of each configuration can be expensive.

Azure Machine Learning allows you to automate hyperparameter exploration in an efficient manner, saving you significant time and resources. You specify the range of hyperparameter values and a maximum number of training runs. The system then automatically launches multiple simultaneous runs with different parameter configurations and finds the configuration that results in the best performance, measured by the metric you choose. Poorly performing training runs are automatically early terminated, reducing wastage of compute resources. These resources are instead used to explore other hyperparameter configurations.

Define search space

Automatically tune hyperparameters by exploring the range of values defined for each hyperparameter.

I especially like messing around with the Intensity and Speed modulated with the Pan, Expression and Pitch types to create a weird atmospheric type sound that adds to my Pad chords.The only catch of getting this plug-in is clicking the Like button on NoiseAsh ‘s Facebook, following their Twitter, or subscribing to their Youtube. Best rhodes piano vst.

Types of hyperparameters

Each hyperparameter can either be discrete or continuous and has a distribution of values described by aparameter expression.

Discrete hyperparameters

Discrete hyperparameters are specified as a choice among discrete values. choice can be:

one or more comma-separated values
a range object
any arbitrary list object

In this case, batch_size takes on one of the values [16, 32, 64, 128] and number_of_hidden_layers takes on one of the values [1, 2, 3, 4].

Advanced discrete hyperparameters can also be specified using a distribution. The following distributions are supported:

quniform(low, high, q) - Returns a value like round(uniform(low, high) / q) * q
qloguniform(low, high, q) - Returns a value like round(exp(uniform(low, high)) / q) * q
qnormal(mu, sigma, q) - Returns a value like round(normal(mu, sigma) / q) * q
qlognormal(mu, sigma, q) - Returns a value like round(exp(normal(mu, sigma)) / q) * q

Continuous hyperparameters

Continuous hyperparameters are specified as a distribution over a continuous range of values. Supported distributions include:

uniform(low, high) - Returns a value uniformly distributed between low and high
loguniform(low, high) - Returns a value drawn according to exp(uniform(low, high)) so that the logarithm of the return value is uniformly distributed
normal(mu, sigma) - Returns a real value that's normally distributed with mean mu and standard deviation sigma
lognormal(mu, sigma) - Returns a value drawn according to exp(normal(mu, sigma)) so that the logarithm of the return value is normally distributed

An example of a parameter space definition:

This code defines a search space with two parameters - learning_rate and keep_probability. learning_rate has a normal distribution with mean value 10 and a standard deviation of 3. keep_probability has a uniform distribution with a minimum value of 0.05 and a maximum value of 0.1.

Sampling the hyperparameter space

You can also specify the parameter sampling method to use over the hyperparameter space definition. Azure Machine Learning supports random sampling, grid sampling, and Bayesian sampling.

Picking a sampling method

Grid sampling can be used if your hyperparameter space can be defined as a choice among discrete values and if you have sufficient budget to exhaustively search over all values in the defined search space. Additionally, one can use automated early termination of poorly performing runs, which reduces wastage of resources.
Random sampling allows the hyperparameter space to include both discrete and continuous hyperparameters. In practice it produces good results most of the times and also allows the use of automated early termination of poorly performing runs. Some users perform an initial search using random sampling and then iteratively refine the search space to improve results.
Bayesian sampling leverages knowledge of previous samples when choosing hyperparameter values, effectively trying to improve the reported primary metric. Bayesian sampling is recommended when you have sufficient budget to explore the hyperparameter space - for best results with Bayesian Sampling we recommend using a maximum number of runs greater than or equal to 20 times the number of hyperparameters being tuned. Note that Bayesian sampling does not currently support any early termination policy.

Random sampling

In random sampling, hyperparameter values are randomly selected from the defined search space. Random sampling allows the search space to include both discrete and continuous hyperparameters.

Grid sampling

Grid sampling performs a simple grid search over all feasible values in the defined search space. It can only be used with hyperparameters specified using choice. For example, the following space has a total of six samples:

Bayesian sampling

Bayesian sampling is based on the Bayesian optimization algorithm and makes intelligent choices on the hyperparameter values to sample next. It picks the sample based on how the previous samples performed, such that the new sample improves the reported primary metric.

When you use Bayesian sampling, the number of concurrent runs has an impact on the effectiveness of the tuning process. Typically, a smaller number of concurrent runs can lead to better sampling convergence, since the smaller degree of parallelism increases the number of runs that benefit from previously completed runs.

Bayesian sampling only supports choice, uniform, and quniform distributions over the search space.

Note

Bayesian sampling does not support any early termination policy (See Specify an early termination policy). When using Bayesian parameter sampling, set early_termination_policy = None, or leave off the early_termination_policy parameter.

Specify primary metric

Specify the primary metric you want the hyperparameter tuning experiment to optimize. Each training run is evaluated for the primary metric. Poorly performing runs (where the primary metric does not meet criteria set by the early termination policy) will be terminated. In addition to the primary metric name, you also specify the goal of the optimization - whether to maximize or minimize the primary metric.

primary_metric_name: The name of the primary metric to optimize. The name of the primary metric needs to exactly match the name of the metric logged by the training script. See Log metrics for hyperparameter tuning.
primary_metric_goal: It can be either PrimaryMetricGoal.MAXIMIZE or PrimaryMetricGoal.MINIMIZE and determines whether the primary metric will be maximized or minimized when evaluating the runs.

Optimize the runs to maximize 'accuracy'. Make sure to log this value in your training script.

3uTools supports to back up and restore, flash and jailbreak, manage files (photos, videos, contacts.), it provides one-click download for iOS users with genuine iOS. 3uToolsv2.11Setup.exe 3uTools is a tool for flashing and jailbreaking Apple’s iPhone, iPad, iPod touch, provides three ways: Easy Mode, Professional Mode or Multiple Flash to flash Apple mobile devices, selects the appropriate firmware automatically and supports a rapid downloading speed. 3uTools Free Download Latest Version for Windows PC. 3utools ios 11 jailbreak.

Log metrics for hyperparameter tuning

The training script for your model must log the relevant metrics during model training. When you configure the hyperparameter tuning, you specify the primary metric to use for evaluating run performance. (See Specify a primary metric to optimize.) In your training script, you must log this metric so it is available to the hyperparameter tuning process.

Log this metric in your training script with the following sample snippet:

The training script calculates the val_accuracy and logs it as 'accuracy', which is used as the primary metric. Each time the metric is logged it is received by the hyperparameter tuning service. It is up to the model developer to determine how frequently to report this metric.

Specify early termination policy

Terminate poorly performing runs automatically with an early termination policy. Termination reduces wastage of resources and instead uses these resources for exploring other parameter configurations.

When using an early termination policy, you can configure the following parameters that control when a policy is applied:

evaluation_interval: the frequency for applying the policy. Each time the training script logs the primary metric counts as one interval. Thus an evaluation_interval of 1 will apply the policy every time the training script reports the primary metric. An evaluation_interval of 2 will apply the policy every other time the training script reports the primary metric. If not specified, evaluation_interval is set to 1 by default.
delay_evaluation: delays the first policy evaluation for a specified number of intervals. It is an optional parameter that allows all configurations to run for an initial minimum number of intervals, avoiding premature termination of training runs. If specified, the policy applies every multiple of evaluation_interval that is greater than or equal to delay_evaluation.

Hyperparameters Machine Learning

Azure Machine Learning supports the following Early Termination Policies.

Bandit policy

Bandit is a termination policy based on slack factor/slack amount and evaluation interval. The policy early terminates any runs where the primary metric is not within the specified slack factor / slack amount with respect to the best performing training run. It takes the following configuration parameters:

slack_factor or slack_amount: the slack allowed with respect to the best performing training run. slack_factor specifies the allowable slack as a ratio. slack_amount specifies the allowable slack as an absolute amount, instead of a ratio.
For example, consider a Bandit policy being applied at interval 10. Assume that the best performing run at interval 10 reported a primary metric 0.8 with a goal to maximize the primary metric. If the policy was specified with a slack_factor of 0.2, any training runs, whose best metric at interval 10 is less than 0.66 (0.8/(1+slack_factor)) will be terminated. If instead, the policy was specified with a slack_amount of 0.2, any training runs, whose best metric at interval 10 is less than 0.6 (0.8 - slack_amount) will be terminated.
evaluation_interval: the frequency for applying the policy (optional parameter).
delay_evaluation: delays the first policy evaluation for a specified number of intervals (optional parameter).

In this example, the early termination policy is applied at every interval when metrics are reported, starting at evaluation interval 5. Any run whose best metric is less than (1/(1+0.1) or 91% of the best performing run will be terminated.

Median stopping policy

Median stopping is an early termination policy based on running averages of primary metrics reported by the runs. This policy computes running averages across all training runs and terminates runs whose performance is worse than the median of the running averages. This policy takes the following configuration parameters:

evaluation_interval: the frequency for applying the policy (optional parameter).
delay_evaluation: delays the first policy evaluation for a specified number of intervals (optional parameter).

In this example, the early termination policy is applied at every interval starting at evaluation interval 5. A run will be terminated at interval 5 if its best primary metric is worse than the median of the running averages over intervals 1:5 across all training runs.

Truncation selection policy

Truncation selection cancels a given percentage of lowest performing runs at each evaluation interval. Runs are compared based on their performance on the primary metric and the lowest X% are terminated. It takes the following configuration parameters:

truncation_percentage: the percentage of lowest performing runs to terminate at each evaluation interval. Specify an integer value between 1 and 99.
evaluation_interval: the frequency for applying the policy (optional parameter).
delay_evaluation: delays the first policy evaluation for a specified number of intervals (optional parameter).

In this example, the early termination policy is applied at every interval starting at evaluation interval 5. A run will be terminated at interval 5 if its performance at interval 5 is in the lowest 20% of performance of all runs at interval 5.

No termination policy

If you want all training runs to run to completion, set policy to None. This will have the effect of not applying any early termination policy.

Default policy

If no policy is specified, the hyperparameter tuning service will let all training runs execute to completion.

Picking an early termination policy

If you are looking for a conservative policy that provides savings without terminating promising jobs, you can use a Median Stopping Policy with evaluation_interval 1 and delay_evaluation 5. These are conservative settings, that can provide approximately 25%-35% savings with no loss on primary metric (based on our evaluation data).
If you are looking for more aggressive savings from early termination, you can either use Bandit Policy with a stricter (smaller) allowable slack or Truncation Selection Policy with a larger truncation percentage.

Allocate resources

Autotune Download

Control your resource budget for your hyperparameter tuning experiment by specifying the maximum total number of training runs. Optionally specify the maximum duration for your hyperparameter tuning experiment.

max_total_runs: Maximum total number of training runs that will be created. Upper bound - there may be fewer runs, for instance, if the hyperparameter space is finite and has fewer samples. Must be a number between 1 and 1000.
max_duration_minutes: Maximum duration in minutes of the hyperparameter tuning experiment. Parameter is optional, and if present, any runs that would be running after this duration are automatically canceled.

Note

If both max_total_runs and max_duration_minutes are specified, the hyperparameter tuning experiment terminates when the first of these two thresholds is reached.

Additionally, specify the maximum number of training runs to run concurrently during your hyperparameter tuning search.

max_concurrent_runs: Maximum number of runs to run concurrently at any given moment. If none specified, all max_total_runs will be launched in parallel. If specified, must be a number between 1 and 100.

Note

The number of concurrent runs is gated on the resources available in the specified compute target. Hence, you need to ensure that the compute target has the available resources for the desired concurrency.

Allocate resources for hyperparameter tuning:

This code configures the hyperparameter tuning experiment to use a maximum of 20 total runs, running four configurations at a time.

Configure experiment

Configure your hyperparameter tuning experiment using the defined hyperparameter search space, early termination policy, primary metric, and resource allocation from the sections above. Additionally, provide an estimator that will be called with the sampled hyperparameters. The estimator describes the training script you run, the resources per job (single or multi-gpu), and the compute target to use. Since concurrency for your hyperparameter tuning experiment is gated on the resources available, ensure that the compute target specified in the estimator has sufficient resources for your desired concurrency. (For more information on estimators, see how to train models.)

Configure your hyperparameter tuning experiment:

Submit experiment

Once you define your hyperparameter tuning configuration, submit an experiment:

experiment_name is the name you assign to your hyperparameter tuning experiment, and workspace is the workspace in which you want to create the experiment (For more information on experiments, see How does Azure Machine Learning work?)

Warm start your hyperparameter tuning experiment (optional)

Often, finding the best hyperparameter values for your model can be an iterative process, needing multiple tuning runs that learn from previous hyperparameter tuning runs. Reusing knowledge from these previous runs will accelerate the hyperparameter tuning process, thereby reducing the cost of tuning the model and will potentially improve the primary metric of the resulting model. When warm starting a hyperparameter tuning experiment with Bayesian sampling, trials from the previous run will be used as prior knowledge to intelligently pick new samples, to improve the primary metric. Additionally, when using Random or Grid sampling, any early termination decisions will leverage metrics from the previous runs to determine poorly performing training runs.

Azure Machine Learning allows you to warm start your hyperparameter tuning run by leveraging knowledge from up to 5 previously completed / cancelled hyperparameter tuning parent runs. You can specify the list of parent runs you want to warm start from using this snippet:

Additionally, there may be occasions when individual training runs of a hyperparameter tuning experiment are cancelled due to budget constraints or fail due to other reasons. It is now possible to resume such individual training runs from the last checkpoint (assuming your training script handles checkpoints). Resuming an individual training run will use the same hyperparameter configuration and mount the outputs folder used for that run. The training script should accept the resume-from argument, which contains the checkpoint or model files from which to resume the training run. You can resume individual training runs using the following snippet:

You can configure your hyperparameter tuning experiment to warm start from a previous experiment or resume individual training runs using the optional parameters resume_from and resume_child_runs in the config:

Visualize experiment

The Azure Machine Learning SDK provides a Notebook widget that visualizes the progress of your training runs. The following snippet visualizes all your hyperparameter tuning runs in one place in a Jupyter notebook:

This code displays a table with details about the training runs for each of the hyperparameter configurations.

You can also visualize the performance of each of the runs as training progresses.

Additionally, you can visually identify the correlation between performance and values of individual hyperparameters using a Parallel Coordinates Plot.

You can visualize all your hyperparameter tuning runs in the Azure web portal as well. For more information on how to view an experiment in the web portal, see how to track experiments.

Find the best model

Once all of the hyperparameter tuning runs have completed, identify the best performing configuration and the corresponding hyperparameter values:

Sample notebook

Refer to train-hyperparameter-* notebooks in this folder:

Learn how to run notebooks by following the article Use Jupyter notebooks to explore this service.

Considerations[edit]

Difficulty learnable parameters[edit]

Untrainable parameters[edit]

Tunability[edit]

Robustness[edit]

Optimization[edit]

Reproducibility[edit]

Services[edit]

Software[edit]

See also[edit]

References[edit]

What are hyperparameters?

Define search space

Types of hyperparameters

Discrete hyperparameters

Continuous hyperparameters

Sampling the hyperparameter space

Picking a sampling method

Random sampling

Grid sampling

Bayesian sampling

Specify primary metric

Log metrics for hyperparameter tuning

Specify early termination policy

Hyperparameters Machine Learning

Bandit policy

Median stopping policy

Truncation selection policy

No termination policy

Default policy

Picking an early termination policy

Allocate resources

Autotune Download

Configure experiment

Submit experiment

Warm start your hyperparameter tuning experiment (optional)

Visualize experiment

Find the best model

Sample notebook

Next steps