Re: Script stops running with no error

Liste des GroupesRevenir à cl python 
Sujet : Re: Script stops running with no error
De : list1 (at) *nospam* tompassin.net (Thomas Passin)
Groupes : comp.lang.python
Date : 28. Aug 2024, 23:32:16
Autres entêtes
Message-ID : <mailman.13.1724884345.2917.python-list@python.org>
References : 1 2
User-Agent : Mozilla Thunderbird
On 8/28/2024 5:09 PM, Daniel via Python-list wrote:
As you all have seen on my intro post, I am in a project using Python
(which I'm learning as I go) using the wikimedia API to pull data from
wiktionary.org. I want to parse the json and output, for now, just the
definition of the word.
 Wiktionary is wikimedia's dictionary.
 My requirements for v1
 Query the api for the definition for table (in the python script).
Pull the proper json
Parse the json
output the definition only
 What's happening?
 I run the script and, maybe I don't know shit from shinola, but it
appears I composed it properly. I wrote the script to do the above.
The wiktionary json file denotes a list with this character # and
sublists as ## but numbers them
 On Wiktionary, the definitions are denoted like:
 1. blablabla
     1. blablabla
     2. blablablablabla
2. balbalbla
3. blablabla
    1. blablabla
  I wrote my script to alter it so that the sublist are letters
 1. blablabla
    a. blablabla
    b. blablabla
2. blablabla and so on
/snip
 At this point, the script stops after it assesses the first line_counter
and sub_counter. The code is below, please tell me which stupid mistake
I made (I'm sure it's simple).
 Am I making a bad approach? Is there an easier method of parsing json
than the way I'm doing it? I'm all ears.
 Be kind, i'm really new at python. Environment is emacs.
 import requests
import re
 search_url = 'https://api.wikimedia.org/core/v1/wiktionary/en/search/page'
search_query = 'table'
parameters = {'q': search_query}
 response = requests.get(search_url, params=parameters)
data = response.json()
 page_id = None
 if 'pages' in data:
     for page in data['pages']:
         title = page.get('title', '').lower()
         if title == search_query.lower():
             page_id = page.get('id')
             break
 if page_id:
     content_url =
     f'https://api.wikimedia.org/core/v1/wiktionary/en/page/
     {search_query}'
     response = requests.get(content_url)
     page_data = response.json()
     if 'source' in page_data:
         content = page_data['source']
         cases = {'noun': r'\{en-noun\}(.*?)(?=\{|\Z)',
                  'verb': r'\{en-verb\}(.*?)(?=\{|\Z)',
                  'adjective': r'\{en-adj\}(.*?)(?=\{|\Z)',
                  'adverb': r'\{en-adv\}(.*?)(?=\{|\Z)',
                  'preposition': r'\{en-prep\}(.*?)(?=\{|\Z)',
                  'conjunction': r'\{en-con\}(.*?)(?=\{|\Z)',
                  'interjection': r'\{en-intj\}(.*?)(?=\{|\Z)',
                  'determiner': r'\{en-det\}(.*?)(?=\{|\Z)',
                  'pronoun': r'\{en-pron\}(.*?)(?=\{|\Z)'
                  #make sure there aren't more word types
         }
          def clean_definition(text):
             text = re.sub(r'\[\[(.*?)\]\]', r'\1', text)
             text = text.lstrip('#').strip()
             return text
                  print(f"\n*** Definition for {search_query} ***")
         for word_type, pattern in cases.items():
             match = re.search(pattern, content, re.DOTALL)
             if match:
                 lines = [line.strip() for line in
         match.group(1).split('\n')
         if line.strip()]
                 definition = []
                 main_counter = 0
                 sub_counter = 'a'
                  for line in lines:
                     if line.startswith('##*') or line.startswith('##:'):
                         continue
                      if line.startswith('# ') or line.startswith('#\t'):
                         main_counter += 1
                         sub_counter = 'a'
                         cleaned_line = clean_definition(line)
                         definition.append(f"{main_counter}. {cleaned_line}")
                     elif line.startswith('##'):
                         cleaned_line = clean_definition(line)
                         definition.append(f"   {sub_counter}. {cleaned_line}")
                         sub_counter = chr(ord(sub_counter) + 1)
                  if definition:
                     print(f"\n{word_type.capitalize()}\n")
                     print("\n".join(definition))
                     break
else:
     print("try again beotch")
You need to check at each part of the code to see if you are getting or producing what you think you are.  You also should create a text constant containing the JSON input you expect to get.  Make sure you can process that.  Start simple - one main item.  Then two main items.  Then two main items with one sub item.  And so on.
I'm not sure what you want to produce in the end but this seems awfully complex to be starting with.  Also you aren't taking advantage of the structure inherent in the JSON.  If the data response isn't too big, you can probably take it as is and use the Python JSON reader to produce a Python data structure.  It should be much easier (and faster) to process the data structure than to repeatedly scan all those lines of data with regexes.

Date Sujet#  Auteur
29 Aug 24 o Re: Script stops running with no error1Thomas Passin

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal