Docstrings · MLPGradientFlow.jl

Constructing Networks

MLPGradientFlow.Net — Type

Net(; layers, input, target, weights = nothing,
      bias_adapt_input = true, derivs = 2, copy_input = true, verbosity = 1,
      Din = size(input, 1) - last(first(layers))*(1-bias_adapt_input))

layers # ((num_neurons_layer1, activation_function_layer1, has_bias_layer1),
       #  (num_neurons_layer2, activation_function_layer2, has_bias_layer2),
       #   ...)
input  # Dᵢₙ × N matrix
target # Dₒᵤₜ × N matrix
weights # nothing or N array
bias_adapt_input = true # adds a row of 1s to the input
derivs = 2              # allocate memory for derivs derivatives (0, 1, 2)
copy_input = true       # copy the input when creating the net

Example

julia> input = randn(2, 100)
julia> target = randn(1, 100)
julia> net = Net(; layers = ((10, softplus, true), (1, identity, true)),
                   input, target);

source

MLPGradientFlow.TeacherNet — Type

TeacherNet(; p = nothing, kwargs...)

Creates a network with parameters p attached. If p == nothing, random_params are generated. A TeacherNet is a callable object that returns the target given some input. Keyword arguments kwargs are passed to Net.

Example

julia> teacher = TeacherNet(; layers = ((8, softplus, true), (1, identity, true)),
                              Din = 3);

julia> input = randn(3, 10^4);

julia> target = teacher(input);

julia> new_input = randn(3, 10^3);

julia> new_target = teacher(new_input);

source

MLPGradientFlow.NetI — Type

NetI(teacher, student; T = eltype(student.input),
     g1 = _stride_arrayize(NormalIntegral(d = 1)),
     g2 = _stride_arrayize(NormalIntegral(d = 2)))

source

MLPGradientFlow.gauss_hermite_net — Function

gauss_hermite_net(target_function, net::Net; kwargs...)

Create from net a network with input points and weights obtained from NormalIntegral, to which kwargs are passed. The target_function is used to compute the target values. Note that in more than 2 input dimensions the number of points is excessively large (with default settings for NormalIntegral more than a million points are generated in 3 dimensions).

Example

julia> net = gauss_hermite_net(x -> reshape(x[1, :] .^ 2, 1, :),
                               Net(layers = ((5, softplus, false),
                                             (1, identity, true)), Din = 2))

source

Loss and its Derivatives

MLPGradientFlow.loss — Function

loss(net, x, input = net.input, target = net.target;
     verbosity = 1, losstype = MSE(), weights = net.weights, maxnorm = Inf,
     merge = nothing)

Compute the loss of net at parameter value x.

source

MLPGradientFlow.gradient — Function

gradient(net, x; input = net.input, target = net.target, kwargs...)

Compute the gradient of net at parameter value x. See loss for kwargs.

source

MLPGradientFlow.hessian — Function

hessian(net, x; input = net.input, target = net.target, kwargs...)

Compute hessian of net at parameter value x. See loss for kwargs.

source

MLPGradientFlow.hessian_spectrum — Function

hessian_spectrum(net, x; kwargs...)

Compute the spectrum of the hessian of net at x. Keyword arguments are passed to hessian.

source

Training

MLPGradientFlow.train — Function

train(net, x0; kwargs...)

Train net from initial point x0.

Keyword arguments:

maxnorm = Inf                  # constant c in loss formula

batchsize = nothing,           # using the full data set in each step when `nothing`

alg = alg_default(p)           # ODE solver: KenCarp58() for length(p) ≤ 64, RK4() otherwise
maxT = 1e10                    # upper integration limit
save_everystep = true          # return a trajectory and loss curve
n_samples_trajectory = 100     # number of samples of the trajectory
abstol = 1e-6                  # absolute tolerance of the ODE solver
reltol = 1e-3                  # relative tolerance of the ODE solver
maxtime_ode = 3*60             # maximum amount of time in seconds for the ODE solver
maxiterations_ode = 10^6       # maximum iterations of ODE solver

maxiterations_optim = 10^5     # maximum iterations of optimizer
min_gnorm = 1e-15              # stop if (the regularized) gradient ∞-norm is below min_gnorm
patience = 10^6                # Number of steps without decrease of the loss function until converged
tauinv = nothing               # nothing, a scalar or a ComponentArray of shape `x0` with inverse time scales
minloss = 2e-32                # stop if MSE loss is below minloss
maxtime_optim = 2*60           # maximum amount of time in seconds for the Newton optimization
optim_solver = optim_solver_default(p) # optimizer: NewtonTrustRegion() for length(p) ≤ 32, :LD_SLSQP for length(p) ≤ 1000, BFGS() otherwise

verbosity = 1                  # use verbosity = 1 for more outputs
show_progress = true           # show progress
progress_interval = 5          # show progress every x seconds
result = :dict                 # change to result = :raw for more detailed results
exclude = String[]             # dictionary keys to exclude from the results dictionary
include = nothing              # dictionary keys to include (everything if nothing)

Runs training dynamics on net from initial point x0. If an Optimisers method, e.g. Adam() or Descent() is chosen for alg, this is the time-discrete dynamics xₜ = tauinv * ∇ loss(net, xₜ₋₁) where loss(net, x) = sum((net(x) - net.target).^2) + R(x) with R(x) = 1/3 * (weightnorm(x) - maxnorm)^3 * I(weightnorm(x) > maxnorm) * n_samples(net). For ODE solvers, this is the continuous ordinary differential equation ẋ = tauinv ∇ loss(net, x), which is integrated from x(t = 0) = p to x(t = maxT). Note that maxT refers to time of the differential equation, whereas maxtime_ode (and maxtime_optim) refers to the amount of wall-clock time given to the computer to run the algorithms.

If maxiterations_optim > 0, the result of this dynamics is given to a (second order) optimizer, to find accurately the nearest minimum.

All gradient norms (min_gnorm, and return values gnorm and gnorm_regularized) are measured in the infinity norm.

source

train(net, x0::Tuple; num_workers = min(length(x0), Sys.CPU_THREADS),
                      kwargs...)

Train from multiple initial points in parallel. kwargs are passed to the train function for each initial point.

source

Utilities

MLPGradientFlow.random_params — Function

random_params(net; kwargs...)

source

random_params(rng, net; distr_fn = glorot_normal)

source

MLPGradientFlow.params — Function

params((w₁, b₁), (w₂, b₂), ...)

Where wᵢ is a weight matrix and bᵢ is a bias vector or nothing (None in python).

source

params(layers::AbstractDict)

Converts parameters in dictionary form to ComponentArray form.

source

MLPGradientFlow.params2dict — Function

params2dict(p)

Convert a ComponentArray parameter to a dictionary

source

MLPGradientFlow.NormalIntegral — Type

NormalIntegral(; N = 600, d = 1, prune = true, threshold = 1e-18)

Callable struct for Gauss-Hermite integration. Uses FastGaussianQuadrature.jl.

Example

julia> integrator = NormalIntegral(d = 1);

julia> integrator(x -> 1.) # normalization of standard normal
0.9999999999999998

julia> integrator(identity) # mean of standard normal
7.578393534606704e-19

julia> integrator(x -> x^2) # variance of standard normal
0.9999999999999969

julia> integrator2d = NormalIntegral(d = 2);

julia> integrator2d(x -> cos(x[1] + x[2])) # integrate some 2D function for x[1] and x[2] iid standard normal
0.3678794411714446

julia> integrator2d(x -> cos(x[1] + x[2]), .5) # integrate with correlation(x[1], x[2]) = 0.5
0.22313016014843348

source

MLPGradientFlow.trajectory_distance — Function

trajectory_distance(res1, res2)

source

trajectory_distance(trajectory, reference_trajectory)

Searches for the closest points of trajectory in reference_trajectory and returns the distances, time points of the reference_trajectory and the indices in reference_trajectory.

source

MLPGradientFlow.LinearSubspace — Type

LinearSubspace(ref, v1, v2)

Construct a 2D linear subspace from ref point in directions v1 and v2. See also subspace_minloss

source

MLPGradientFlow.to_local_coords — Function

to_local_coords(ls::LinearSubspace, p)

Project point p to the linear subspace ls.

source

MLPGradientFlow.subspace_minloss — Function

subspace_minloss(net, ref, v1, v2, a1, a2)

Minimize loss in the subspace orthogonal to v1 and v2 with the point in the 2D subspace fixed to ref + a1 * v1 + a2 * v2.

source

subspace_minloss(net, ls::LinearSubspace, a1, a2)

Minimize loss in the subspace orthogonal to ls with the point in ls fixed to ls.ref + a1 * ls.v1 + a2 * ls.v2.

source

MLPGradientFlow.grow_net — Function

grow_net(net)

Add one neuron to the hidden layer. Works only for networks with a single hidden layer.

source

MLPGradientFlow.shrink_net — Function

shrink_net(net)

Remove one neuron from the hidden layer. Works only with networks with a single hidden layer.

source

MLPGradientFlow.split_neuron — Function

split_neuron(p, i, γ, j = i+1)

Duplicate hidden neuron i with mixing ratio γ and insert the new neuron at position j. Works only for the parameters p of a network with a single hidden layer.

source

MLPGradientFlow.cosine_similarity — Function

cosine_similarity(x, y) = x'*y/(norm(x)*norm(y))

source

Saving

MLPGradientFlow.pickle — Function

pickle(filename, result; exclude = String[])

Save the result of training in filename in torch.pickle format. See also result2dict.

source

MLPGradientFlow.unpickle — Function

unpickle(filename)

Loads the results saved with the above function pickle(filename, result; exclude = String[]).

source