Wrong ideas about chatbots

Liste des GroupesRevenir à c misc 
Sujet : Wrong ideas about chatbots
De : ram (at) *nospam* zedat.fu-berlin.de (Stefan Ram)
Groupes : comp.misc
Date : 08. Jun 2025, 01:39:21
Autres entêtes
Organisation : Stefan Ram
Message-ID : <parsing-20250608013417@ram.dialup.fu-berlin.de>
Ben Collver <bencollver@tilde.pink> wrote or quoted:
|         you'd start with an enormous quantity of text, then do a lot
|of computationally-intense statistical analysis to map out which
|words and phrases are most likely to appear near to one another.

  I had already explained why that was off, but let me give you
  all a recent example from a recent chat of mine with a chatbot.

  I asked the chatbot to write a program for left-associative parsing
  of English. He must have mixed up my "left-associative" with the
  more common "left-recursive" or figured I just said it wrong.

  He clearly did not know about this specific left-associative
  parsing method for natural languages. Even after I gave him
  the exact source, he still did not get it right.

  Then I laid it out for him [1]. After that, he wrote a program [2]
  for left-associative parsing of English. Here is what he produced:
  [3]. I also asked him for an explanation of the approach to laymen,
  so if you want to learn more about it, see [4].

  So tell me, how is real understanding like this supposed
  to happen if chatbots just work based on "a statistical
  analysis of which words often show up together"?

  [1] How I explained it to him, after his first program was
  not what I wanted

Maybe you're mislead applying standard terms to the special
NEWCAT approach.

It starts by an empty object and then always appends the next
word until the end of text is reached. No big recursion there.

We get: empty + "the"

Now, we need a grammar rule to see if that combination is
possible. For this purpose each of "empty" and "the" have
a category which is a data structure with their attributes,
and the grammar rule than checks whether two things with these
categories can be combined and if so, it creates the new
sentence start "empty + 'the'" with a new category given by
the rule.

Then we try to add "cat" to the sentence start. So the sentence
is build left-to-right, that's what's "left-associative" about it.

Finally, we add the "." after "fish" and then we may have
a complete sentence if everything was allowed by the rules.

This parsing is trivial. The crucial thing is to write the
categories and the rules so that all continuations of a
sentence start that are legal in English are allowed by the
rules and all others are rejected.

  [2] The parser for a tiny subset of English he wrote then,
      according to my above explanation

import builtins
import sys
import time

# Toy lexicon
LEXICON = {
    'the': ('DET', {}),
    'cat': ('N', {'number': 'sg'}),
    'cats': ('N', {'number': 'pl'}),
    'dog': ('N', {'number': 'sg'}),
    'dogs': ('N', {'number': 'pl'}),
    'and': ('CONJ', {}),
    'eats': ('V', {'number': 'sg'}),
    'eat': ('V', {'number': 'pl'}),
}

def get_category(word):
    if word not in LEXICON:
        raise ValueError(f"Unknown word: {word}")
    return LEXICON[word]

def combine(state, next_token):
    cat, features = next_token
    print(f"    [combine] State: {state}, Next: ({cat}, {features})")
    # If state is None, we're at the start
    if state is None:
        if cat == 'DET':
            print("    [combine] Start DET")
            return ('subj', {'number': None, 'is_complete': False, 'needs_conj': False, 'pending_det': True})
        elif cat == 'N':
            print("    [combine] Start bare N (fail)")
            return None
        else:
            print("    [combine] Start fail")
            return None
    # If we're building a subject
    if state[0] == 'subj':
        subj = state[1]
        # --- Coordination context: CONJ + DET + N ---
        if subj.get('needs_conj') and subj.get('pending_det') and cat == 'N':
            print("    [combine] CONJ + DET + N -> coordinated NP (plural)")
            return ('subj', {'number': 'pl', 'is_complete': True, 'needs_conj': False, 'pending_det': False})
        # --- Plain DET + N (not in coordination) ---
        if subj.get('pending_det') and cat == 'N':
            print("    [combine] DET + N -> NP")
            return ('subj', {'number': features['number'], 'is_complete': True, 'needs_conj': False, 'pending_det': False})
        # --- NP + CONJ ---
        if subj.get('is_complete') and cat == 'CONJ':
            print("    [combine] NP + CONJ")
            return ('subj', {'number': subj['number'], 'is_complete': False, 'needs_conj': True, 'pending_det': False})
        # --- CONJ + DET ---
        if subj.get('needs_conj') and cat == 'DET':
            print("    [combine] CONJ + DET")
            return ('subj', {'number': None, 'is_complete': False, 'needs_conj': True, 'pending_det': True, 'prev_number': subj['number']})
        # --- NP + V (agreement check) ---
        if subj.get('is_complete') and cat == 'V':
            print(f"    [combine] NP + V: subj number {subj['number']}, verb number {features['number']}")
            if subj['number'] == features['number']:
                print("    [combine] Agreement OK")
                return ('sent', {'number': subj['number']})
            else:
                print("    [combine] Agreement FAIL")
                return None
        print("    [combine] No rule applies (fail)")
        return None
    # If we already have a sentence, nothing more allowed
    if state[0] == 'sent':
        print("    [combine] Already sent (fail)")
        return None
    print("    [combine] No rule applies (fail)")
    return None

def parse(sentence):
    words = sentence.lower().split()
    state = None
    print(f"\nParsing: {sentence}")
    for word in words:
        print(f"  State: {state}, Next: {word}")
        cat = get_category(word)
        state = combine(state, cat)
        if state is None:
            print(f"  Failed at: {word}")
            return False
    print(f"  Final state: {state}")
    return state is not None and state[0] == 'sent'

# Test sentences
sentences = [
    "the cat eats",             # ✔️ singular
    "the cats eat",             # ✔️ plural
    "the cat and the dog eat",  # ✔️ coordination (plural)
    "the cat and the dogs eat", # ✔️ coordination (plural)
    "the cat and the dog eats", # ❌ subject-verb disagreement
    "the cats eats",            # ❌ subject-verb disagreement
    "the cat and eat",          # ❌ incomplete NP
    "cat eats",                 # ❌ missing DET
    "the cat and the",          # ❌ incomplete NP
]

for s in sentences:
    result = parse(s)
    print(f"Result: {'ACCEPT' if result else 'REJECT'}")

  [3] the abridged output of the parser

Parsing: the cat eats
Result: ACCEPT

Parsing: the cats eat
Result: ACCEPT

Parsing: the cat and the dog eat
Result: ACCEPT

Parsing: the cat and the dogs eat
Result: ACCEPT

Parsing: the cat and the dog eats
Result: REJECT

Parsing: the cats eats
Result: REJECT

Parsing: the cat and eat
Result: REJECT

Parsing: cat eats
Result: REJECT

Parsing: the cat and the
Result: REJECT

  [4] the chatbot explains it:

What is Left-Associative Parsing?

Left-associative parsing is a way of analyzing a sentence by
reading it from left to right, always combining the information
you've built up so far (the "sentence start" or "current state")
with the next word. 

- At every step, you only ever look at the current state and
  the next word - never more than that.

- You never look ahead or backtrack; you just keep moving
  forward.

- This is sometimes called "incremental" or "shift-reduce"
  parsing, but here we mean:  Always combine the current
  state with the next word, and update the state.

Why is this interesting?

- It's simple, fast, and models how humans often process
  language in real time.

- It forces you to encode all the information you'll need for
  future steps in the current state, because you never get to
  "look back".

What are Complex Categories?

In traditional grammar, you might label things as "Noun Phrase
(NP)", "Verb Phrase (VP)", etc. 

But in left-associative parsing, the current state must
carry all the information you'll need for the rest of the
parse. 

So, you use complex categories - data structures that store
not just the grammatical type, but also features like number
(singular/plural), whether you're in the middle of a coordination
("and"), whether you're expecting a noun, and so on.

In the script: 

- The state is a tuple like ('subj', {...}) or
  ('sent', {...}).

- The dictionary inside holds all the features you need:
  "number", "is_complete", "needs_conj", "pending_det", etc.



Date Sujet#  Auteur
8 Jun 25 * Wrong ideas about chatbots4Stefan Ram
8 Jun 25 +- Re: Wrong ideas about chatbots1Stefan Ram
8 Jun 25 `* Re: Wrong ideas about chatbots2Ben Collver
8 Jun 25  `- Re: Wrong ideas about chatbots1Stefan Ram

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal