From: Vlastimil Brom on 23 Jun 2010 21:07 Hi all, I'd like to ask about the most reasonable/recommended/... way to modify the functionality of the standard library module (if it is recommended at all). I'm using difflib.SequenceMatcher for character-wise comparisons of the texts; although this might not be a usual use case, the results are fine for the given task; however, there were some cornercases, where the shown differences were clearly larger than needed. As it turned out, this is due to a kind of specialcasing of relatively more frequent items; cf. http://bugs.python.org/issue1528074#msg29269 http://bugs.python.org/issue2986 The solution (or workaround) for me was to modify the SequenceMatcher class by adding another parameter checkpopular=True which influences the behaviour of the __chain_b function accordingly. The possible speed issues with this optimisation turned off (checkpopular=False) don't really matter now and the comparison results are much better for my use cases. However, I'd like to ask, how to best maintain this modified functionality in the sourcecode. I tried some possibilities, which seem to work, but I'd appreciate suggestions on the preferred way in such cases. - It is simply possibly to have a modified sourcefile difflib.py in the script directory. - Furthermore one can subclass difflib.SequenceMatcher an overide its __chain_b function (however the name doesn't look like a "public" function ... - I guess, it wouldn't be recommended to directly replace difflib.SequenceMatcher._SequenceMatcher__chain_b ... In all cases I have either a copy of the whole file or the respective function as a part of my source. I'd appreciate comments or suggestions on this or maybe another better approaches to this problem. Thanks in advance, vbr
From: Bruno Desthuilliers on 24 Jun 2010 04:07 Vlastimil Brom a �crit : > Hi all, > I'd like to ask about the most reasonable/recommended/... way to > modify the functionality of the standard library module (if it is > recommended at all). (snip) > However, I'd like to ask, how to best maintain this modified > functionality in the sourcecode. > I tried some possibilities, which seem to work, but I'd appreciate > suggestions on the preferred way in such cases. > - It is simply possibly to have a modified sourcefile difflib.py in > the script directory. You'd better do real fork then and rename the damn thing to avoid confusions and name shadowing. > - Furthermore one can subclass difflib.SequenceMatcher an overide its > __chain_b function (however the name doesn't look like a "public" > function ... It's indeed a very "private" one. Beware of name mangling here, can lead to surprising results !-) Also, overriding an implementation method, your code might break with each new release, so it kind of tie you to a specific version (or set of...). The odds depend on difflib's source code stability. > - I guess, it wouldn't be recommended to directly replace > difflib.SequenceMatcher._SequenceMatcher__chain_b ... For which definition of "directly replace" ? If you mean patching the standardlib's source code inplace, then it's definitly not something i'd do. Monkeypatching OTHO is sometimes the simplest solution, specially for temporary fixes or evolutions. Anyway - which solution (forking, subclassing or monkeypatching) is the most appropriate really depends on the context so only you can decide. If it's for personal use only and not mission-critical, go for the simplest working solution. If it's going to be publicly released, you may want to consider contacting the difflib maintainer and submit a patch, and rely on a monkeypatch in the meantime. If you think you'll have a need for more modifications / specialisations / evolution to difflib, then just fork. My 2 cents.
From: Vlastimil Brom on 24 Jun 2010 11:05 2010/6/24 Bruno Desthuilliers <bruno.42.desthuilliers(a)websiteburo.invalid>: > Vlastimil Brom a écrit : >> >> Hi all, >> I'd like to ask about the most reasonable/recommended/... way to >> modify the functionality of the standard library module (if it is >> recommended at all). > > ... >> - I guess, it wouldn't be recommended to directly replace >> difflib.SequenceMatcher._SequenceMatcher__chain_b ... > > For which definition of "directly replace" ? If you mean patching the > standardlib's source code inplace, then it's definitly not something i'd do. > Monkeypatching OTHO is sometimes the simplest solution, specially for > temporary fixes or evolutions. > > Anyway - which solution (forking, subclassing or monkeypatching) is the most > appropriate really depends on the context so only you can decide. If it's > for personal use only and not mission-critical, go for the simplest working > solution. If it's going to be publicly released, you may want to consider > contacting the difflib maintainer and submit a patch, and rely on a > monkeypatch in the meantime. If you think you'll have a need for more > modifications / specialisations / evolution to difflib, then just fork. > > My 2 cents. > -- > Many thanks for your insights! Just now, I am the almost the only user of this script, hence the consequences of version mismatches etc. shouldn't (directly) affect anyone else, fortunately. However, I'd like to ask for some clarification about monkeypatching - With "directly replace" I meant something like the following scenario: import difflib ..... def tweaked__chain_b(self): # modified code of the function __chain_b copy from Lib\difflib.py ... difflib.SequenceMatcher._SequenceMatcher__chain_b = tweaked__chain_b this way I can only unconditionally change the functionality, as the signature of SequenceMatcher (which is then used in my script) remains unchanged. I thought, this would qualify as monkeypatching, but I am apparently missing some distinction between "patching the ... code inplace" and "monkeypatching". Is it maybe a difference, if one makes "backups" of the original objects and reactivates them after the usage of the patched code? By subclassing (which I am using just now in the code) the behaviour can be parametrised: class my_difflib_SequenceMatcher(difflib.SequenceMatcher): def __init__(self, isjunk=None, a='', b='', checkpopular=True): # checkpopular added parameter to the signature self.checkpopular = checkpopular ... def __chain_b(self): # modified copy from Lib\difflib.py - reacting to the value of self.checkpopular An "official" update of the source in the standard library is probably not viable (at least not in a way that would currently help me, as my code only supports python 2.x due to the relevant dependencies (wxpython ....) Otherwise, it would depend on other users' needs (e.g. finer diff at the cost of the much slower code in some cases ) Thanks again for your thoughts. vbr
From: Bruno Desthuilliers on 24 Jun 2010 12:07 Vlastimil Brom a �crit : > > Many thanks for your insights! > Just now, I am the almost the only user of this script, hence the > consequences of version mismatches etc. shouldn't (directly) affect > anyone else, fortunately. So far so good. > However, I'd like to ask for some clarification about monkeypatching - > With "directly replace" I meant something like the following scenario: > > import difflib > .... > def tweaked__chain_b(self): > # modified code of the function __chain_b copy from Lib\difflib.py > ... > > difflib.SequenceMatcher._SequenceMatcher__chain_b = tweaked__chain_b > > I thought, this would qualify as monkeypatching, It does, indeed > but I am apparently > missing some distinction between "patching the ... code inplace" and > "monkeypatching". "patching source code" canonically means "physically" modifying the original source file. Monkeypatching - which can only be done in some dynamic languages - is what you're doing above, ie dynamically replacing a given feature at runtime. > By subclassing (which I am using just now in the code) If it already works and you don't have to care too much about possible compat issues with different difflib versions, then look no further.
From: Vlastimil Brom on 24 Jun 2010 13:40 2010/6/24 Bruno Desthuilliers <bruno.42.desthuilliers(a)websiteburo.invalid>: > Vlastimil Brom a écrit : ..... > > "patching source code" canonically means "physically" modifying the original > source file. Monkeypatching - which can only be done in some dynamic > languages - is what you're doing above, ie dynamically replacing a given > feature at runtime. > >> Thank you very much for the clarification (I indeed didn't consider this "ultima ratio" approach :-) Thanks for the positive suggestion as well. Regards, vbr
|
Pages: 1 Prev: Pythonic Idiom For Searching An Include Path Next: [ANN] filepath 0.1 |