In the following example, we train a network with one hidden layer of 5 softplus neurons on random in- and output.
using MLPGradientFlow, Random
Random.seed!(123)
input = randn(2, 1_000) # 2-dimensional random input
target = randn(1, 1_000) # 1-dimensional random output
net = Net(; layers = ((5, softplus, true), # 5 relu neurons with biases
(1, identity, true)), # 1 identity neuron with bias
input,
target)
p = random_params(net)
result = train(net, p, maxtime_ode = 20., maxtime_optim = 20., n_samples_trajectory = 10^3)
Dict{String, Any} with 20 entries:
"gnorm" => 0.000266414
"init" => Dict("w1"=>[0.149272 -0.196412 0.0; -0.329254 0.715402…
"x" => Dict("w1"=>[249.907 630.423 -562.057; -849.088 833.593…
"optim_stopped_by" => "maxtime > 20.0s"
"loss_curve" => [4.12157, 0.954065, 0.953853, 0.953699, 0.953576, 0.95…
"target" => [0.911875 -0.792403 … -0.313917 -1.1881]
"optim_iterations" => 35910
"ode_stopped_by" => "maxtime > 20.0s"
"ode_iterations" => 16215
"optim_time_run" => 20.0002
"converged" => false
"ode_time_run" => 20.0125
"loss" => 0.931382
"input" => [0.808288 -1.10464 … 0.723346 -0.0100866; -1.12207 -0.…
"trajectory" => OrderedDict(0.0=>Dict("w1"=>[0.149272 -0.196412 0.0; -…
"ode_x" => Dict("w1"=>[242.727 612.421 -546.012; -828.488 813.391…
"total_time" => 58.0817
"ode_loss" => 0.931382
"layerspec" => ((5, "softplus", true), (1, "identity", true))
"gnorm_regularized" => 0.000266414
We see that optimization has found a point with a very small gradient:
gradient(net, params(result["x"]))
ComponentVector{Float64}(w1 = [-5.49742804717997e-10 -5.270526274886363e-10 -3.525930894062807e-11; -1.178423977009666e-10 6.30208657165246e-12 1.7580112100705025e-9; … ; -0.0002664141590097161 -8.01588469692512e-5 0.0001282238798960975; 0.00013277124072172842 4.0090816226306756e-5 -6.391578814725563e-5], w2 = [1.4680928733658804e-5 -4.757219152317312e-5 … -2.720766461922371e-7 -5.1857675189559415e-8])
Let us inspect the spectrum of the hessian:
hessian_spectrum(net, params(result["x"]))
LinearAlgebra.Eigen{Float64, Float64, Matrix{Float64}, Vector{Float64}}
values:
21-element Vector{Float64}:
1.5594428959234961e-12
1.5295958469681234e-11
4.9569177828406145e-11
2.608758161077601e-10
6.595889508548072e-8
1.688161914160224e-7
2.5988154065503106e-7
2.380974867547192e-6
0.0004177478695865345
0.002631646232980403
⋮
0.49734615192971177
5.042991225483392
18.57234904846318
37.37158586864807
40652.8008788391
63391.76118790042
95517.61546781032
600133.1975408694
1.8525709238761403e6
vectors:
21×21 Matrix{Float64}:
-0.0308835 -0.27617 0.0140424 … 4.53172e-8 1.16063e-8
0.465112 -0.0490118 0.00395827 -4.85093e-7 -3.41079e-7
4.09077e-7 -5.27456e-6 -1.86647e-5 0.0552058 0.346285
2.98568e-7 4.84051e-6 1.23959e-5 -0.102301 -0.685435
3.14216e-7 -6.97663e-6 6.24743e-6 0.047067 0.339665
-0.0771807 -0.692395 0.0352193 … -1.19692e-7 -2.03167e-8
-0.45611 0.0482199 -0.00388703 3.78723e-7 1.21703e-7
7.23838e-8 1.10161e-6 1.22571e-5 -0.382722 0.0491175
1.34287e-7 -5.42582e-6 -8.23323e-6 0.76004 -0.100631
1.12791e-7 1.78705e-6 -4.07833e-6 -0.377837 0.0516258
⋮ ⋱ ⋮
5.93749e-7 -2.67743e-6 -1.71091e-6 0.032243 -0.207586
5.65415e-7 -1.91331e-6 1.15858e-6 -0.071411 0.410723
5.48108e-7 -2.22178e-6 4.0556e-7 0.0392549 -0.203405
5.94903e-8 5.31652e-7 -2.70511e-8 … -0.0534202 -0.00723375
-5.18858e-7 5.45061e-8 -4.62889e-9 -0.326864 -0.162713
0.00209523 -0.140842 -0.655287 2.38252e-5 -0.000727835
-0.00228847 -0.0365286 0.748002 4.4219e-5 -0.000708195
0.000193399 0.177362 -0.0927152 6.39545e-5 -0.000686609
0.0 5.4938e-7 0.0 … 0.000104171 -0.000515339
The eigenvalues are all positive, indicating that we are in a local minimum.
using CairoMakie
function plot_losscurve(result; kwargs...)
f = Figure()
ax = Axis(f[1, 1], yscale = log10, xscale = log10, ylabel = "loss", xlabel = "time", kwargs...)
scatter!(ax, collect(keys(result["trajectory"])) .+ 1, result["loss_curve"])
f
end
plot_losscurve(result)