Sujet : RE: Module urljoin does not appear to work with scheme Gemini
De : ajm (at) *nospam* flonidan.dk (Anders Munch)
Groupes : comp.lang.pythonDate : 24. Apr 2025, 09:36:12
Autres entêtes
Message-ID : <mailman.30.1745483776.3008.python-list@python.org>
References : 1 2 3 4 5
Henry S. Thompson wrote:
Some approach to support future-proofing in general would seem to be in order.
Given some other precedents, adding a boolean argument called either 'strict' or 'lax' would be my preference.
An alternative would be to refactor urllib.parse to use strategy objects
for schemes.
parse.py contains a number of lists of scheme names, that act as flags to
control parsing behaviour:
uses_relative, uses_netloc, uses_params, non_hierarchical, uses_query and uses_fragment.
(If written today they would be sets, but this is very old code that predates sets!)
Group that information by scheme instead of by flag name, in e.g. a dataclass, and
you have made yourself a strategy object lookup table:
scheme_options = {
'https': SchemeOptions(uses_relative=True, uses_netloc=True, uses_params=True),
'git': SchemeOptions(uses_relative=False, uses_netloc=True, uses_params=False),
...
}
Once you have that, you can add the strategy object as an optional argument to
functions. If the argument is not given, you find a strategy object from
scheme_options to use. If the argument is given, you use that.
The best part of this approach is that you now have a way of saying "treat this
scheme exactly like https":
from urllib import parse
parse.urljoin('sptth://...', '../one-level-up', options=parse.scheme_options['https'])
Note: I wrote this before I realised that the lists non_hierarchical, uses_query
and uses_fragment are not used. With only three options instead of six, making
a strategy object is not quite as attractive. But still worth considering.
regards, Anders