Sujet : Credits go to Rolf Pfeiffer (Was: Progress via library(linear) (Was: Higher Order Logic Programming and Autograd))
De : janburse (at) *nospam* fastmail.fm (Mild Shock)
Groupes : comp.lang.prologDate : 16. Mar 2025, 23:07:47
Autres entêtes
Message-ID : <vr7i3h$tou$1@solani.org>
References : 1 2
User-Agent : Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0 SeaMonkey/2.53.20
What made me do the lillte prototype? Try this one,
it has a little Java code. But its a little ancient
technologie using the sigmoid activation function. And
it seems to me it uses some graph datastructure:
> Neural Networks
> Rolf Pfieffer et al. - 2012
>
https://www.ifi.uzh.ch/dam/jcr:00000000-7f84-9c3b-ffff-fffffb34b58a/NN20120315.pdf
I guess it corresponds to this here, which is a SWI-Prolog and C
hybrid, when using FANN_SIGMOID:
> FANN - Fast Artificial Neural Network
> Package for SWI-Prolog - 2018
>
https://www.swi-prolog.org/pack/list?p=plfannTranslating the Java code to Prolog from the Pfeiffer
paper into linear algebra using vectors and matrixes, I
have now a little piece of pure Prolog code, that runs
also in the Browser, that can already learn an
AND, and its using the ReLU activation function,
i.e. not the FANN_SIGMOID activation function anymore.
I simulated the bias by an extra input neuron
which is always 1, because I was to lazy to have
bias in the model. Sample output:
A -- 0.99 ---\
\
B -- 0.99 -----+-- ReLu -->
/
1 -- -0.98 --/
It can als learn an XOR. Libraries such as PyTorch
cooperate with optimizer libraries that provide a
variety of gradient search methods. One needs
to study how these library are architectured so that
they provide plug and play. Maybe can bring the same
architecture to Prolog:
> A Gentle Introduction to torch.autograd
>
> Next, we load an optimizer, in this case SGD with a
> learning rate of 0.01 and momentum of 0.9. We register all
> the parameters of the model in the optimizer.
>
> optim = torch.optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)
https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.htmlMild Shock schrieb:
Ok some progress report here. I have currently a
library(linear) in the working which is only a few
lines of code, but it provides vectors and matrixes.
One can use the library to define matrix exponentiation:
matexp(M, 1, M) :- !.
matexp(M, N, R) :- N mod 2 =:= 0, !,
I is N // 2,
matexp(M, I, H),
matmul(H, H, R).
matexp(M, N, R) :-
I is N-1,
matexp(M, I, H),
matmul(H, M, R).
And then do fancy stuff like answering the question
what are the last 8 digits of fibonacci(1000000):
?- time((fib(1000000, _X), Y is _X mod 10^8)).
% Zeit 28 ms, GC 0 ms, Lips 88857, Uhr 16.03.2025 22:48
Y = 42546875
The 28 ms execution time are not bad, since modulo was not
integrated into matexp/3, making it to compute the full fibonacci(1000000) before taking the modulo. Not sure whether
JavaScript bigint is faster or slower than GMP ?
So what can we do with library(linear) besides implementing
eval/3 and back/3 ? We can finally update a neural network
and do this iteratively. Using a very simple random pick
to choose some training data sample:
update([V], _, [V]) :- !.
update([V,M|L], [_,M3|R], [V,M4|S]) :-
maplist(maplist(compose(add,mul(0.1))), M3, M, M4),
update(L, R, S).
iter(0, _, N, N) :- !.
iter(I, Z, N, M) :-
random(R), K is floor(R*4)+1,
call_nth(data(Z, X, Y), K),
eval(N, X, U),
back(U, Y, V),
update(U, V, W),
J is I-1,
iter(J, Z, W, M).
Disclaimer: This is only a proof of concept. It mostlikely
doesn’t have all the finess of Python torch.autograd. Also
it uses a very simple update of the weights via μ Δwij with
μ = 0.1. But you can already use it to learn an AND
or to learn an XOR.
Mild Shock schrieb:
Somehow I shied away from implementing call/n for
my new Prolog system. I thought my new Prolog system
has only monomorphic caches , I will never be able to
>
replicate what I did for my old Prolog system with
arity polymorphic caches. This changed when I had
the idea to dynamically add a cache for the duration
>
of a higher order loop such as maplist/n, foldl/n etc…
>
So this is the new implementation of maplist/3:
>
% maplist(+Closure, +List, -List)
maplist(C, L, R) :-
sys_callable_cacheable(C, D),
sys_maplist(L, D, R).
>
% sys_maplist(+List, +Closure, -List)
sys_maplist([], _, []).
sys_maplist([X|L], C, [Y|R]) :-
call(C, X, Y),
sys_maplist(L, C, R).
>
Its similar as the SWI-Prolog implementation in that
it reorders the arguments for better first argument
indexing. But the new thing is sys_callable_cacheable/1,
>
which prepares the closure to be more efficiently
called. The invocation of the closure is already
quite fast since call/3 is implemented natively,
>
but the cache adds an itch more speed. Here some
measurements that I did:
>
/* SWI-Prolog 9.3.20 */
?- findall(X,between(1,1000,X),L), time((between(1,1000,_),
maplist(succ,L,_),fail; true)), fail.
% 2,003,000 inferences, 0.078 CPU in 0.094 seconds
>
/* Scryer Prolog 0.9.4-350 */
?- findall(X,between(1,1000,X),L), time((between(1,1000,_),
maplist(succ,L,_),fail; true)), fail.
% CPU time: 0.318s, 3_007_105 inferences
>
/* Dogelog Player 1.3.1 */
?- findall(X,between(1,1000,X),L), time((between(1,1000,_),
maplist(succ,L,_),fail; true)), fail.
% Zeit 342 ms, GC 0 ms, Lips 11713646, Uhr 10.03.2025 09:18
>
/* realla Prolog 2.64.6-2 */
?- findall(X,between(1,1000,X),L), time((between(1,1000,_),
maplist(succ,L,_),fail; true)), fail.
% Time elapsed 1.694s, 15004003 Inferences, 8.855 MLips
>
Not surprisingly SWI-Prolog is fastest. What was
a little surprise is that Scryer Prolog can do it quite
fast, possibly since they heavily use maplist/n all
>
over the place, they came up with things like '$fast_call'
etc.. in their call/n implementation. Trealla Prolog is
a little bit disappointing at the moment.
>