Sujet : I didn't invent these things (Was: Will a decoder-only transformer also work?)
De : janburse (at) *nospam* fastmail.fm (Mild Shock)
Groupes : comp.lang.prologDate : 02. Mar 2025, 22:39:27
Autres entêtes
Message-ID : <vq2j6f$v95h$1@solani.org>
References : 1 2
User-Agent : Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0 SeaMonkey/2.53.20
Thank you that you think,
I would invent these things:
> Are you thinking that autoencoders
> could play a bigger role in tasks like
> language modeling
Nope, it is all in the papers, like here:
> **Attention Is All You Need**
> Vaswani et al., 2017
>
https://arxiv.org/abs/1706.03762The conclusion says, its same architecture
as autoencoders:
> In this work, we presented the Transformer,
> the first sequence transduction model based
> entirely on attention, replacing the recurrent
> layers most commonly used in encoder-decoder
> architectures with multi-headed self-attention.
Same architecture with latent spaces between
encoder and decoder. The training on my laptop
would take, for the EN-DE model ConvS2S Ensemble
reported in the paper Table 2, using my GPU:
7.7e19 / 3e13 = 1 month
If I would try to train GPT 4.5 on my
laptop it would take:
1E23 / 3e13 = 3'000 years
P.S.: The paper is the from the same Vaswani et al.,
2017 as referenced in the Python code of
the other Grokking paper.
Mild Shock schrieb:
Ok, my bad. You can of course also try a decoder-only.
Just like here in this Python code example:
> **Simple PyTorch Implementation of “Grokking”**
> We trained a standard decoder-only transformer (Vaswani et al., 2017)
> https://github.com/teddykoker/grokking
The transformer need not necessarely have a encoder and
a latent space. It can be also a decoder-only.
Mild Shock schrieb:
>
Very simple challenge conceptually, develop the idea
of Centipawn towards TicTacToe and implement the
game based on learning / training a transformer, and
>
then executing it. All written in Prolog itself! Optional
bonus exercise, make the execution ИИUƎ style, i.e.
incremental evaluation of the transformer.
>
Centipawn - Chess Wiki
https://chess.fandom.com/wiki/Centipawn
>
NNUE - Chess Programming Wiki
https://www.chessprogramming.org/NNUE