Sujet : Re: A technique from a chatbot
De : nntp.mbourne (at) *nospam* spamgourmet.com (Mark Bourne)
Groupes : comp.lang.pythonDate : 05. Apr 2024, 21:59:54
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <uupl7t$1ird0$1@dont-email.me>
References : 1 2 3 4 5 6
User-Agent : Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 SeaMonkey/2.53.18.1
Stefan Ram wrote:
Mark Bourne <nntp.mbourne@spamgourmet.com> wrote or quoted:
I don't think there's a tuple being created. If you mean:
( word for word in list_ if word[ 0 ]== 'e' )
...that's not creating a tuple. It's a generator expression, which
generates the next value each time it's called for. If you only ever
ask for the first item, it only generates that one.
Yes, that's also how I understand it!
In the meantime, I wrote code for a microbenchmark, shown below.
This code, when executed on my computer, shows that the
next+generator approach is a bit faster when compared with
the procedural break approach. But when the order of the two
approaches is being swapped in the loop, then it is shown to
be a bit slower. So let's say, it takes about the same time.
There could be some caching going on, meaning whichever is done second comes out a bit faster.
However, I also tested code with an early return (not shown below),
and this was shown to be faster than both code using break and
code using next+generator by a factor of about 1.6, even though
the code with return has the "function call overhead"!
To be honest, that's how I'd probably write it - not because of any thought that it might be faster, but just that's it's clearer. And if there's a `do_something_else()` that needs to be called regardless of the whether a word was found, split it into two functions:
```
def first_word_beginning_with_e(target, wordlist):
for w in wordlist:
if w.startswith(target):
return w
return ''
def find_word_and_do_something_else(target, wordlist):
result = first_word_beginning_with_e(target, wordlist)
do_something_else()
return result
```
But please be aware that such results depend on the implementation
and version of the Python implementation being used for the benchmark
and also of the details of how exactly the benchmark is written.
import random
import string
import timeit
print( 'The following loop may need a few seconds or minutes, '
'so please bear with me.' )
time_using_break = 0
time_using_next = 0
for repetition in range( 100 ):
for i in range( 100 ): # Yes, this nesting is redundant!
list_ = \
[ ''.join \
( random.choices \
( string.ascii_lowercase, k=random.randint( 1, 30 )))
for i in range( random.randint( 0, 50 ))]
start_time = timeit.default_timer()
for word in list_:
if word[ 0 ]== 'e':
word_using_break = word
break
else:
word_using_break = ''
time_using_break += timeit.default_timer() - start_time
start_time = timeit.default_timer()
word_using_next = \
next( ( word for word in list_ if word[ 0 ]== 'e' ), '' )
time_using_next += timeit.default_timer() - start_time
if word_using_next != word_using_break:
raise Exception( 'word_using_next != word_using_break' )
print( f'{time_using_break = }' )
print( f'{time_using_next = }' )
print( f'{time_using_next / time_using_break = }' )