Training to convergence · MLPGradientFlow.jl

In the following example, we train a network with one hidden layer of 5 softplus neurons on random in- and output.

using MLPGradientFlow, Random

Random.seed!(123)

input = randn(2, 1_000)  # 2-dimensional random input
target = randn(1, 1_000) # 1-dimensional random output

net = Net(; layers = ((5, softplus, true),     # 5 relu neurons with biases
                      (1, identity, true)), # 1 identity neuron with bias
            input,
            target)

p = random_params(net)

result = train(net, p, maxtime_ode = 20., maxtime_optim = 20., n_samples_trajectory = 10^3)

Dict{String, Any} with 20 entries:
  "gnorm"             => 0.000266414
  "init"              => Dict("w1"=>[0.149272 -0.196412 0.0; -0.329254 0.715402…
  "x"                 => Dict("w1"=>[249.907 630.423 -562.057; -849.088 833.593…
  "optim_stopped_by"  => "maxtime > 20.0s"
  "loss_curve"        => [4.12157, 0.954065, 0.953853, 0.953699, 0.953576, 0.95…
  "target"            => [0.911875 -0.792403 … -0.313917 -1.1881]
  "optim_iterations"  => 35910
  "ode_stopped_by"    => "maxtime > 20.0s"
  "ode_iterations"    => 16215
  "optim_time_run"    => 20.0002
  "converged"         => false
  "ode_time_run"      => 20.0125
  "loss"              => 0.931382
  "input"             => [0.808288 -1.10464 … 0.723346 -0.0100866; -1.12207 -0.…
  "trajectory"        => OrderedDict(0.0=>Dict("w1"=>[0.149272 -0.196412 0.0; -…
  "ode_x"             => Dict("w1"=>[242.727 612.421 -546.012; -828.488 813.391…
  "total_time"        => 58.0817
  "ode_loss"          => 0.931382
  "layerspec"         => ((5, "softplus", true), (1, "identity", true))
  "gnorm_regularized" => 0.000266414

We see that optimization has found a point with a very small gradient:

gradient(net, params(result["x"]))

ComponentVector{Float64}(w1 = [-5.49742804717997e-10 -5.270526274886363e-10 -3.525930894062807e-11; -1.178423977009666e-10 6.30208657165246e-12 1.7580112100705025e-9; … ; -0.0002664141590097161 -8.01588469692512e-5 0.0001282238798960975; 0.00013277124072172842 4.0090816226306756e-5 -6.391578814725563e-5], w2 = [1.4680928733658804e-5 -4.757219152317312e-5 … -2.720766461922371e-7 -5.1857675189559415e-8])

Let us inspect the spectrum of the hessian:

hessian_spectrum(net, params(result["x"]))

LinearAlgebra.Eigen{Float64, Float64, Matrix{Float64}, Vector{Float64}}
values:
21-element Vector{Float64}:
      1.5594428959234961e-12
      1.5295958469681234e-11
      4.9569177828406145e-11
      2.608758161077601e-10
      6.595889508548072e-8
      1.688161914160224e-7
      2.5988154065503106e-7
      2.380974867547192e-6
      0.0004177478695865345
      0.002631646232980403
      ⋮
      0.49734615192971177
      5.042991225483392
     18.57234904846318
     37.37158586864807
  40652.8008788391
  63391.76118790042
  95517.61546781032
 600133.1975408694
      1.8525709238761403e6
vectors:
21×21 Matrix{Float64}:
 -0.0308835    -0.27617      0.0140424   …   4.53172e-8    1.16063e-8
  0.465112     -0.0490118    0.00395827     -4.85093e-7   -3.41079e-7
  4.09077e-7   -5.27456e-6  -1.86647e-5      0.0552058     0.346285
  2.98568e-7    4.84051e-6   1.23959e-5     -0.102301     -0.685435
  3.14216e-7   -6.97663e-6   6.24743e-6      0.047067      0.339665
 -0.0771807    -0.692395     0.0352193   …  -1.19692e-7   -2.03167e-8
 -0.45611       0.0482199   -0.00388703      3.78723e-7    1.21703e-7
  7.23838e-8    1.10161e-6   1.22571e-5     -0.382722      0.0491175
  1.34287e-7   -5.42582e-6  -8.23323e-6      0.76004      -0.100631
  1.12791e-7    1.78705e-6  -4.07833e-6     -0.377837      0.0516258
  ⋮                                      ⋱                 ⋮
  5.93749e-7   -2.67743e-6  -1.71091e-6      0.032243     -0.207586
  5.65415e-7   -1.91331e-6   1.15858e-6     -0.071411      0.410723
  5.48108e-7   -2.22178e-6   4.0556e-7       0.0392549    -0.203405
  5.94903e-8    5.31652e-7  -2.70511e-8  …  -0.0534202    -0.00723375
 -5.18858e-7    5.45061e-8  -4.62889e-9     -0.326864     -0.162713
  0.00209523   -0.140842    -0.655287        2.38252e-5   -0.000727835
 -0.00228847   -0.0365286    0.748002        4.4219e-5    -0.000708195
  0.000193399   0.177362    -0.0927152       6.39545e-5   -0.000686609
  0.0           5.4938e-7    0.0         …   0.000104171  -0.000515339

The eigenvalues are all positive, indicating that we are in a local minimum.

using CairoMakie

function plot_losscurve(result; kwargs...)
    f = Figure()
    ax = Axis(f[1, 1], yscale = log10, xscale = log10, ylabel = "loss", xlabel = "time", kwargs...)
    scatter!(ax, collect(keys(result["trajectory"])) .+ 1, result["loss_curve"])
    f
end

plot_losscurve(result)