Sujet : Re: Correct syntax for pathological re.search()
De : ram (at) *nospam* zedat.fu-berlin.de (Stefan Ram)
Groupes : comp.lang.pythonDate : 12. Oct 2024, 12:59:52
Autres entêtes
Organisation : Stefan Ram
Message-ID : <compiled-20241012123950@ram.dialup.fu-berlin.de>
References : 1 2 3 4 5
"Peter J. Holzer" <
hjp-python@hjp.at> wrote or quoted:
But - without having looked at the implementation - it's far from clear
that the compiled form would be useful to the user.
So, what he might be getting at with "compiled form" is a
representation that's easy on the eyes for us mere mortals.
You could, for instance, use colors to show the difference between
object and meta characters. In that case, the regex "\**" would
come out as "**", but the first "*" might be navy blue (on a white
background), so just your run-of-the-mill object character, while
the second one would be burgundy, flagging it as a meta character.
So, simplified, that would be something like:
import re
import tkinter as tk
import time
def tokenize_regex( pattern ):
tokens = []
i = 0
while i < len( pattern ):
if pattern[ i ] == '\':
if i + 1 < len( pattern ):
tokens.append( ( 'escaped', pattern[ i+1: i+2 ]))
i += 2
else:
tokens.append( ('error', 'Incomplete escape sequence' ))
i += 1
elif pattern[i] == '*':
tokens.append( ( 'repetition', '*' ))
i += 1
else:
tokens.append( ( 'plain', pattern[ i ]))
i += 1
return tokens
root = tk.Tk()
root.configure( bg='white' )
regex = r'\**'
result = tokenize_regex( regex )
for token_type, token_value in result:
if token_type == 'plain' or token_type == 'escaped':
tk.Label( root, text=token_value, font=( 'Arial', 40 ), fg='#4070FF', bg='white' ).pack( side='left' )
elif token_type == 'repetition':
tk.Label( root, text=token_value, font=( 'Arial', 40 ), fg='#C02000', bg='white' ).pack( side='left' )
root.mainloop()
.