From: Peter Olcott on 29 Mar 2010 12:10 "Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message news:3o40r5ht9c7hmk3s65p6no5asjqp5aiq76(a)4ax.com... > See below... > On Fri, 26 Mar 2010 11:01:54 -0500, "Peter Olcott" > <NoSpam(a)OCR4Screen.com> wrote: > >>The experts are telling me that my real-time process does >>not need to be memory resident. >> > *** > I said that > (a) real-time has to be defined in terms of a response > window 100 ms for a client request of maxmum size Now that I have done some benchmark testing of the probable performance of the (current redesign) space within time optimization of my OCR algorithm, much of what I said previously becomes moot. Response time is so much faster (because of spatial locality of reference) that I now have time to load the data during the client request. Previously this would have destroyed my target response time of 100 ms, now it fits well within it. > (b) there are a large number of features whose time > constants have the property that they > can interfere with the requirement (a), of which paging > MIGHT be one (but you have > insisted that it will ALWAYS be a problem, in ANY realtime > system, under ANY conditions) > but (b) scheduling, cache behavior, priority inversion, > and many other charactersitics > might ALSO be a problem and > (c) anything which interferes with (a) must be eliminated. > Paging MIGHT be one of these. > > Please quote me correctly. > joe > > Joseph M. Newcomer [MVP] > email: newcomer(a)flounder.com > Web: http://www.flounder.com > MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Joseph M. Newcomer on 29 Mar 2010 12:45 See below... On Mon, 29 Mar 2010 09:57:59 -0500, "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote: > >"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in >message news:oesvq55safqsrg8jih8peiaah4uiqt0qi3(a)4ax.com... >> Well, I know the answer, and I think you are behaving in >> yet another clueless fashion. And >> in my earlier reply I told you why. You want "fault >> tolerance" without even understanding >> what that means, and choosing an implementation whose >> fundamental approach to fault > >The only fault tolerance that I want or need can be provided >very simply. The original specification of fault tolerance >that I provided was much more fault tolerance than would be >cost-effective. If I really still wanted this level of fault >tolerance then many of your comments on this subject would >not be moot. Since this degree of fault tolerance has been >determined to never be cost-effective, then any details of >providing this level of fault tolerance become moot. > >The other Pete had greater insights into my own needs than I >did myself. I will paraphrase what he said. I only need to >avoid losing transactions. When a client makes a request, I >only need to avoid losing this request until it is >completed. Any faults in-between can be restarted from the >beginning. **** Yout total cluelessness about TCP/IP comes to the fore again. Suppose you have established a connection to the machine. The machine reboots. What happened to that connection? Well, IT NO LONGER EXISTS! So you can't reply over it! Even if you have retained the information about the data to be processed, YOU HAVE NO WAY TO COMMUNICATE TO THE CLIENT MACHINE! In what fantasy world does the psychic plane allow you to magically re-establish communication with the client machine? And don't tell me you can use the IP address to re-establish connectivity. If you don't understand how NAT works, both at the local level and at the ISP level, you cannot tell me that retaining the IP address can work, because I would immediately know you were wrong. **** > >The key (not all the details, just the essential basis for >making it work) to providing this level of fault tolerance >is to have the webserver only acknowledge web requests after >the web request have been committed to persistent storage. **** Your spec of dealing with someone pulling the plug, as I explained, is a pointless concern. So why are you worrying about something that has a large negative exponent in its probability (1**-n for n something between 6 and 15)? There are higher-probability events you MIGHT want to worry about. **** > >The only remaining essential element (not every little >detail just the essence) is providing a way to keep track of >web requests to make sure that they make it to completed >status in a reasonable amount of time. A timeout threshold >and a generated exception report can provide feedback here. **** But if you have a client timeout, the client can resubmit the request, so there is no need to retain it on the server. So why are you desirous of expending effort to deal with an unlikely event? And implementing complex mechanisms to solve problems that do not require solution on the server side? And at no point did you talk about how you do the PayPal credit, and if you are concerned with ANY robustness, THAT's the place you have to worry about it! And how does PayPal and committed transactions sit with your magical 500ms limit and the "no paging, no disk access, ever" requirements? **** > >Please make any responses to the above statement within the >context of the newly defined much narrower scope of fault >tolerance. **** If by "fault tolerance" you mean "recovering from pulling the plug from the wall" my response is simple: "you are clueless as to what is actually important". I would first of all redefine fault tolerance to mean "No client is ever charged for a transaction that is not completed", and that is the ONLY important "fault tolerance" you need. Anything else is just completely silly, because of the low probability. And you have created two requirements (PayPal and transacted file system) that are incompatible with all your previous non-negotiable requirements. joe **** Joseph M. Newcomer [MVP] email: newcomer(a)flounder.com Web: http://www.flounder.com MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Joseph M. Newcomer on 29 Mar 2010 13:05 See below... On Mon, 29 Mar 2010 10:22:59 -0500, "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote: > >"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in >message news:4ftvq5l0qdfmrfmd4a351cc0lt98er8p56(a)4ax.com... >> See below... >> On Fri, 26 Mar 2010 09:55:54 -0500, "Peter Olcott" >> <NoSpam(a)OCR4Screen.com> wrote: >> >>> >>>"Oliver Regenfelder" <oliver.regenfelder(a)gmx.at> wrote in >>>message >>>news:e148c$4bac7685$547743c7$23272(a)news.inode.at... >>>> Hello, >>>> >>>> Peter Olcott wrote: >>>>> I don't know. It does not yet seem worth the learning >>>>> curve cost. The process is intended to be always >>>>> running >>>>> and loaded with data. >>>> >>>> I would say using memory mapped files with e.g. boost is >>>> not >>>> that steep a learning curve. >>>> >>>> Best regards, >>>> >>>> Oliver >>> >>>If we are talking on the order of one day to become an >>>expert on this, and it substantially reduces my data load >>>times, it may be worth looking into. I am still convinced >>>that it is totally useless for optimizing page faults for >>>my >>>app because I am still totally convinced the preventing >>>page >>>faults is a far better idea than making them fast. >>>Locking >>>pages into memory will be my approach, or whatever the >>>precise terminology is. >> *** >> And your ****opinion*** about what is going to happen to >> your page fault budget is >> meaningless noise because you have NO DATA to tell you >> ANYTHING! You have ASSUMED that a >> MMF is going to "increase" your page faults, with NO >> EVIDENCE to support this ridiculous > >I did not say anything at all like this. Here is what I >said: >(1) I need zero pages faults **** WHich already says you are clueless. What you REALLY said was "I need to respond within 500ms" and somehow you have converted this requirement to "I need zero page faults". And you now have added "I must process page faults in order to use a transacted database" Huh? [Did you miss the part where people explained that the disk file system is based entirely on paging?] >(2) MMF does not provide zero page faults **** In this case, I have to say it: You are STUPID! I have explained SEVERAL TIMES why a MMF does NOT induce additional page faults, but you have failed to comprehend ANYTHING Hector and I have been telling you! I have even tried to explain how it can REDUCE your startup cost! But no, you REFUSE to pay attention to ANYTHING we have said. This is not an ad hominem attack. This is an evaluation based upon a long set of interactions in which we tell you "A" and you tell us "not A". And we say "A" and you insist "not A". And we say "you are wrong, it is A" and point you at references, and you apparently do not read them, and you say "not A". There are very few characterizations of a person who holds such a belief in the case where there is overwhelming evidence. One is "religious fanatic" and the other is "stupid". So as long as you continue to make stupid statements that conflict with reality, you will continue to demonstrate that you are not qualified to work in this profession. I'm tired of telling you that you are out of contact with reality. You apparently inhabit a reality which is NOT AT ALL the reality the rest of us live and work in. **** >(3) Locking memory does provide zero pages faults **** Locking memory solves the zero page faults problem, and introduces a whole whopping lot of new problems, and does NOT change the PayPal or committed-transactions problems. **** >(4) Therefore I need locking memory and not MMF *** I repeat: YOU ARE STUPID! NOPLACE DID ANYONE *EVER* TELL YOU THAT MMF INDUCED ADDITIONAL PAGE FAULTS!!!! In fact, we tried to tell that this is NOT true! This is a fiction you have invented, based on your own flawed understanding of reality! It isn't that we haven't tried to educate you! A refusal to accept facts as stated and insist that your fantasy is reality is indicative of a serious loss of contact with reality. So the other option is that you are in need of serious psyhchological care because you are denying reality. Perhaps you are not stupid, just in need of professional counseling by someone whose profession is not computing but the human mind. Medication might help. Those little voices that keep telling you "Memory Mapped Files induce uncontrolled page faults" may be the problem. Hector and I do not hear them, and I know that I am not taking psychoactive drugs to suppress the little voices that whisper "Memory Mapped Files induce uncontrolled page faults". I suspect Hector isn't, either. I suspect that we both understand how the operating system works. And where, in all lthe reading you have done, is there any evidence that VirtualLock will not apply to a block of memory that is mapped to a file? Duh! In fact, it DOES apply. But had you actrually spent any time READING about how any of this works, you would have UNDERSTOOD this obvious fact! It isn't that we didn't try to explain it! joe ***** > >> position. At the very worst, you will get THE SAME NUMBER >> of page faults, and in the best >> case you will have FEWER page faults. But go ahead, >> assume your fantasy is correct, and >> lose out on optimizations you could have. You are fixated >> on erroneous premises, which >> the rest of us know are erroneous, and you refuse to learn >> anything that might prove your >> fantasy is flawed. >> joe >> **** >>> >> Joseph M. Newcomer [MVP] >> email: newcomer(a)flounder.com >> Web: http://www.flounder.com >> MVP Tips: http://www.flounder.com/mvp_tips.htm > Joseph M. Newcomer [MVP] email: newcomer(a)flounder.com Web: http://www.flounder.com MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Peter Olcott on 29 Mar 2010 13:12 "Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message news:mcl1r597pthv9priqa6vla6np19l6p0ic1(a)4ax.com... > See below... > On Mon, 29 Mar 2010 09:57:59 -0500, "Peter Olcott" > <NoSpam(a)OCR4Screen.com> wrote: > >> >>"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in >>message news:oesvq55safqsrg8jih8peiaah4uiqt0qi3(a)4ax.com... >>> Well, I know the answer, and I think you are behaving in >>> yet another clueless fashion. And >>> in my earlier reply I told you why. You want "fault >>> tolerance" without even understanding >>> what that means, and choosing an implementation whose >>> fundamental approach to fault >> >>The only fault tolerance that I want or need can be >>provided >>very simply. The original specification of fault tolerance >>that I provided was much more fault tolerance than would >>be >>cost-effective. If I really still wanted this level of >>fault >>tolerance then many of your comments on this subject would >>not be moot. Since this degree of fault tolerance has been >>determined to never be cost-effective, then any details of >>providing this level of fault tolerance become moot. >> >>The other Pete had greater insights into my own needs than >>I >>did myself. I will paraphrase what he said. I only need to >>avoid losing transactions. When a client makes a request, >>I >>only need to avoid losing this request until it is >>completed. Any faults in-between can be restarted from the >>beginning. > **** > Yout total cluelessness about TCP/IP comes to the fore > again. Suppose you have > established a connection to the machine. The machine > reboots. What happened to that > connection? Well, IT NO LONGER EXISTS! So you can't > reply over it! Even if you have > retained the information about the data to be processed, > YOU HAVE NO WAY TO COMMUNICATE TO > THE CLIENT MACHINE! False assumption. A correct statement would be I have no way to communicate to the client that you are ware of (see below). > In what fantasy world does the psychic plane allow you to > magically > re-establish communication with the client machine? That one is easy. All users of my system must provide a verifiably valid email address. If at any point after the client request if fully received the connection is lost, the output is sent to the email address. > > And don't tell me you can use the IP address to > re-establish connectivity. If you don't > understand how NAT works, both at the local level and at > the ISP level, you cannot tell me > that retaining the IP address can work, because I would > immediately know you were wrong. > **** >> >>The key (not all the details, just the essential basis for >>making it work) to providing this level of fault tolerance >>is to have the webserver only acknowledge web requests >>after >>the web request have been committed to persistent storage. > **** > Your spec of dealing with someone pulling the plug, as I > explained, is a pointless > concern. And I have already said this preliminary spec has been rewritten. > So why are you worrying about something that has a large > negative exponent in > its probability (1**-n for n something between 6 and 15)? > There are higher-probability > events you MIGHT want to worry about. > **** >> >>The only remaining essential element (not every little >>detail just the essence) is providing a way to keep track >>of >>web requests to make sure that they make it to completed >>status in a reasonable amount of time. A timeout threshold >>and a generated exception report can provide feedback >>here. > **** > But if you have a client timeout, the client can resubmit > the request, so there is no need > to retain it on the server. So why are you desirous of > expending effort to deal with an > unlikely event? And implementing complex mechanisms to > solve problems that do not require Every request costs a dime. If the client re-submits the same request it costs another dime. Once a request is explicitly acknowledged as received, the acknowledgement response will also inform them that resubmitting will be incur an additional charge. > solution on the server side? And at no point did you talk > about how you do the PayPal > credit, and if you are concerned with ANY robustness, > THAT's the place you have to worry > about it! > > And how does PayPal and committed transactions sit with > your magical 500ms limit and the > "no paging, no disk access, ever" requirements? > **** >> >>Please make any responses to the above statement within >>the >>context of the newly defined much narrower scope of fault >>tolerance. > **** > If by "fault tolerance" you mean "recovering from pulling > the plug from the wall" my No not anymore. Now that I have had some time to think about fault tolerance (for the first time in my life) it becomes obvious that this will not be the benchmark, except for the initial request / request acknowledgement part of the process. > response is simple: "you are clueless as to what is > actually important". I would first of > all redefine fault tolerance to mean "No client is ever > charged for a transaction that is > not completed", and that is the ONLY important "fault > tolerance" you need. Anything else > is just completely silly, because of the low probability. > And you have created two > requirements (PayPal and transacted file system) that are > incompatible with all your > previous non-negotiable requirements. > joe > **** > Joseph M. Newcomer [MVP] > email: newcomer(a)flounder.com > Web: http://www.flounder.com > MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Joseph M. Newcomer on 29 Mar 2010 14:04
See below... On Mon, 29 Mar 2010 10:19:07 -0500, "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote: > >"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in >message news:1osvq5pshcrt4j4usqe2qlkfa29ap4tujo(a)4ax.com... >> See below... >> On Thu, 25 Mar 2010 19:27:39 -0500, "Peter Olcott" >> <NoSpam(a)OCR4Screen.com> wrote: >> > >>>> What if the fault is a bad image of your DFA loaded from >>>> disk? >>>> >>> >>>Rebuild and reload this image. >> ***** >> And you know this is the problem how? I built a system >> that was robust enough to recover >> from memory parity errors, and once a day took a serious >> error and had to rebuild its heap >> from scratch. It took a year to develop a system that >> would not fail (it was distributed >> on 16 processors), and it reacted by failing a request and >> requiring the application to >> re-send the request (essentially, all transactions in >> flight at time of the failure were >> aborted and rolled back). You need to very closely and >> tightly specify what you mean by >> "fault", by "tolerant", and EXACTLY what your recovery >> mechanism will be. You can't say >> "for example, pulling the plug". You have to say "In >> situation X, failure is going to >> manifest itself in fashion Y, and recovery is by >> procecedure Z" and that enumeration must >> be exhaustive. >> **** > >I have redefined fault tolerance with the much narrower >scope of simply not losing any customer transactions, and >reporting any transactions that take too long. **** You have not defined how you plan to achieve this. As I explained earlier, you need a timeout. And the timeout will probably exceed 500ms, and all the effort is going to be on the client side. If you have to restart the app, and you've lost the TCP/IP connetion, there is NO WAY to NOT "lose" the transaction! Your failure to deal with the concept that "no client will be charged for an uncompleted transaction" is going to get you into trouble (seriously). You have not suggested a mechanism by which a lost transaction (e.g., due to catastrophic server failure) will be sent back to the client, so there is no recovery mechanism. To do recovery after a fault (one of the essential characteristics of fault tolerance) you have to posit a recovery mechanism. You have not done this. Yes, the use of MySQL, or some other transacted file mechanism, to record the incoming transaction is essential, but you need to justify why the page faults this will induce (greater than 0 by some considerable amount) will be acceptable given your (rather silly) "no page faults are acceptable" criterion. And you need to posit a "rollback" mechanism that handles crediting properly (oh, the best way would be a background process that handles the charging and credits, but this can interfere with the performance issues you hold so dear). You have not posited a way to inform the client of the failure (other than timeout, which will probably exceed 500ms) nor a mechanism to re-establish the transaction (which requires the client side re-submitting it! You can't reconnect to the client! TCP/IP doesn't allow this as a concept!) Instead, you just hope there is a zero-cost handwave that is going to magically solve all this. To do this, you need a deep understanding of the underlying mechanisms. Which you persist in not acquiring, because you want everything boiled down to its "essence". OK, here's the essence of fault tolerance: IT IS HARD. AND YOU HAVE TO UNDERSTAND HOW EVERY COMPONENT PLAYS A PART IN IMPLEMENTING IT. And to undetrstand the componenents, you have to have a DEEP understanding of the underlying mechanisms! Here;s the simplest approach to fault tolerance: (a) screw it. Abort the transaction, put the component (app, system, subroutine) into an initial state, and start the next transaction (b) having aborted the transaction, let someone else discover this and deal with whatever the recovery plan is I've built several fault-tolerant systems using this methodology. Let me tell you, exceptions are your new best friend. Transacted file systems are essential if there is a system terminationl the recovery on system failure requires persistent storage (in one system, I added new APIs to the operating system that kept persistent state in the kernel; if the kernel crashed, nobody cared that my app had failed, so this was a good repository) joe >> **** >> Unless it fails...you just said, you are going to have >> rebuild. So you move the respons > >Maybe include an MD5 hash in the process to verify whether >or not the image need to be rebuilt. **** Ohhh, what an insight! If the data is bad, why do you think "rebuilding" the same data is going to make the problem go away? Note that parity checking will generally detect data corruption errors. And, if you have been reading at all, you would know that you could use VirtualAlloc to change your pages to "read-only" so they could not accidentally change, so only a hardware failure (which will probably be detected) will corrupt the data; otherwise, the errors will be intrinsic to the data, so rebuilding will reintroduce the error (I solved this by erasing my heap and rebuilding it from scratch; not an easy thing to do unless you "own" the heap allocation code, which I did) > >What are the causes of a corrupted image? >(1) Software faults >(2) Hardware faults > >What can I do about (1)? Write the best code that I can and >make sure that the OS is very stable release. >What else can I do about (1)? I don't know what do you know >Joe? **** (1) ignore it. See the above solution; if there is a software fault, rebuilding the data will not make it go away. Log the problem so you can fix it, abort the transaction, and go on to the next transaction. Then, somehow (usually by some external mechanism) detect that the transaction has failed. DO NOT RETRY THE TRANSACTION in this case, because thiswill block your server, failing repeatedly on the same transaction and blocking all future transactions. Instead, I would, just before I start the processing of a transaction, record in the persistent store that this transaction is "active", make sure that update is committed to the database, and upon resumption, ignore any pending transactions that we active. Only when the transaction was completed and the response sent to the user would I mark it as "completed" and commit that update to the disk. This keeps you out of the bashing-your-head-against-a-brick-wall syndrome where your program fails on a transaction, restarts it, fails, restrarts it, fails, and so on, indefinitely/ Note that there are at least two commits on each transaction, perhaps three (one to enter it, one to say it is active, and one to say it is completed). While the "is completed" transaction doesn't count in the budget for the transaction itself, it counts against starting the next trnasaction, thus adding to its processing time. But, of course, this is inconsistent with the "zero page faults" goal. Database commits are SLOW operations (in fact, the person who invented the concept of transacted databases was working on eliminating the concept by figuring out how to handle distributed and potentially disconnected databasesm accirding to a friend of mine who was working with him, because of the serious performance problems with transactions) (2) ignore it. If it happens, you have a failure. The process will terminate, and you have no control over this. So your recovery mechanism has to detect that the process has failed, and deal with what it means to recover from this. **** > >What can I do about (2)? >(a) Preventative things >Multiple servers each with RAID 1 arrays **** Parity errors are one hardware problem. Page-in errors, when they occur, are another Disk read failures (related to page-in errors) is a fundamental hardware problem. Data corruption on the disk is another RAID-1 is a half-assed attempt to deal with this/ The simplese solution is to ignore the reason for hardware failures and build in a recovery mechanism that handles all catastrophic hardware failures. Note that it is up to your ISP to deal with multiple servers, clustering, file system reliability, etc., and each additional requirement you give adds to your cost. **** > >(b) Corrective things after a problem occurs >Besides have the service provider replace the hardware, I >don't know. >Joe do you know of any other alternatives that are not very >expensive? **** I just outlined them. Of course, "expensive" has different metrics. Expensive in transaction time, expensive in development time, expensive in hardware costs. Since I was always working with fixed hardware, our expenses were always in development costs, and we were willing to pay the additional transaction times to get the robustness. You should assume fixed hardware, and be perpared to spent lots of your time building recovery mechanisms, and you have ro realize that there will be performance costs. **** > >> time from 500ms to 3.5 minutes without warning? This >> means you have failed to define what >> you mean by "fault tolerant". One of the requirements of >> fault tolerance is knowing what >> "tolerance" means. > >The goal of 500 ms response time is for a narrow subset of >customers that will probably have their own dedicated >server. I will do comprehensive stress testing before I go >into production. I will do this by generating loads at the >servers threshold on a 1 gigabit intranet. **** Actually, you are making an assumption here, that they will have a server on their site. This is inconsistent with your security goals (you seem to have thought that "opening the box" was a relevant concept, but send me a server, and without any attempt to open it, I will very shortly have a copy of your program on my machine. This is because I own a number of "administrator tools" that let me copy files, reset passwords, etc. just by booting from a floppy, CD-ROM or USB device; and if you send me a computer sealed in a block of plastic or concrete (maybe with ventilation holes), I can use any number of well-known privilege escalation techniques to gain control of the machine over the network connection). Security? I See No Security Here! Welcome to the Real World. So once that machine leaves your hands, you can assume that it will be compromised. Even if you keep it in your building, it can be compromised. > >I do not ever envision providing a system that can produce >correct results even with defective hardware. **** Define "correct". If there is defective hardware, either (a) it is self-diagnosing (parity errors, etc.) and therefore results in an error condition that aborts the transaction or (b) the errors are undetected, in which case you will produce *a* result but will have no idea if it is correct or not! **** > >> >> I suppose my experience building fault-tolerant systems is >> another of those annoying >> biases I have. >> joe >> **** > >Initially fault tolerance will only be to not lose any >customer transactions, and to report and transactions that >take too long. ***** Oh, now "fault tolerance" means "a transaction that takes too long". You keep changing the definition every time you give it. Define "lose". You are still riding far behind the specification curve. Requirements are not specifications. Before you start coding, you have to at least move from the requirements to a specification of how you are going to achieve them. ANd those specifications have to have realizable implementations. Handwaves don't work. You have to know how to write actual code. Right now, you are at the handwave-requirements. You have to go to really precise requirements, to really precise specifications of how you will implement those requirements, to actual code. You are a VERY long way from even a decent requirement description. At least every time you state them, they are slightly different. joe **** > >>> >> Joseph M. Newcomer [MVP] >> email: newcomer(a)flounder.com >> Web: http://www.flounder.com >> MVP Tips: http://www.flounder.com/mvp_tips.htm > Joseph M. Newcomer [MVP] email: newcomer(a)flounder.com Web: http://www.flounder.com MVP Tips: http://www.flounder.com/mvp_tips.htm |