I didn't invent these things (Was: Will a decoder-only transformer also work?)

Liste des GroupesRevenir à cl prolog 
Sujet : I didn't invent these things (Was: Will a decoder-only transformer also work?)
De : janburse (at) *nospam* fastmail.fm (Mild Shock)
Groupes : comp.lang.prolog
Date : 02. Mar 2025, 22:39:27
Autres entêtes
Message-ID : <vq2j6f$v95h$1@solani.org>
References : 1 2
User-Agent : Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0 SeaMonkey/2.53.20
Thank you that you think,
I would invent these things:
 > Are you thinking that autoencoders
 > could play a bigger role in tasks like
 > language modeling
Nope, it is all in the papers, like here:
 > **Attention Is All You Need**
 > Vaswani et al., 2017
 > https://arxiv.org/abs/1706.03762
The conclusion says, its same architecture
as autoencoders:
 > In this work, we presented the Transformer,
 > the first sequence transduction model based
 > entirely on attention, replacing the recurrent
 > layers most commonly used in encoder-decoder
 > architectures with multi-headed self-attention.
Same architecture with latent spaces between
encoder and decoder. The training on my laptop
would take, for the EN-DE model ConvS2S Ensemble
reported in the paper Table 2, using my GPU:
7.7e19 / 3e13 = 1 month
If I would try to train GPT 4.5 on my
laptop it would take:
1E23 / 3e13 = 3'000 years
P.S.: The paper is the from the same Vaswani et al.,
2017 as referenced in the Python code of
the other Grokking paper.
Mild Shock schrieb:
 Ok, my bad. You can of course also try a decoder-only.
Just like here in this Python code example:
  > **Simple PyTorch Implementation of “Grokking”**
 > We trained a standard decoder-only transformer (Vaswani et al., 2017)
 > https://github.com/teddykoker/grokking
 The transformer need not necessarely have a encoder and
a latent space. It can be also a decoder-only.
 Mild Shock schrieb:
>
Very simple challenge conceptually, develop the idea
of Centipawn towards TicTacToe and implement the
game based on learning / training a transformer, and
>
then executing it. All written in Prolog itself! Optional
bonus exercise, make the execution ИИUƎ style, i.e.
incremental evaluation of the transformer.
>
Centipawn - Chess Wiki
https://chess.fandom.com/wiki/Centipawn
>
NNUE - Chess Programming Wiki
https://www.chessprogramming.org/NNUE
 

Date Sujet#  Auteur
24 Feb 25 * Spring 2025 Challenge: TicTacToe Transformer5Mild Shock
25 Feb 25 +* Dead horse or wake up call? (Was: Spring 2025 Challenge: TicTacToe Transformer)2Mild Shock
25 Feb 25 i`- Some Progress with SWI-Prolog eForests (Re: Dead horse or wake up call?)1Mild Shock
2 Mar 25 `* Will a decoder-only transformer also work? (Was: Spring 2025 Challenge: TicTacToe Transformer)2Mild Shock
2 Mar 25  `- I didn't invent these things (Was: Will a decoder-only transformer also work?)1Mild Shock

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal