From: Peter Olcott on 6 Apr 2010 19:21 "Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message news:jn2nr5150ud4p3rnp06fl4nlfu2g3j34jt(a)4ax.com... > On Mon, 5 Apr 2010 21:24:01 -0500, "Peter Olcott" > <NoSpam(a)OCR4Screen.com> wrote: > >> First read section 1.0, which is what I've been trying to >>> tell you for several days: >>> either everything happens, or nothing happens. >>> >> >>I have known this since 1991. > **** > So why did you start insisting that pwrite guaranteed > transactional integrity? > **** I never did this. This is another false assumption of yours. >> >>> The article is quite thorough, although I'm curious why >>> you call it a "design pattern". It >> >>This detailed design can be easily adapted to provide the >>same end-result, hence "design pattern". It may not match >>the gang of four style, but a design pattern just the >>same. > **** > Actually, it is not a pattern at all; in fact, we were > keeping audit trails back in the It is close enough. It is a design from which I can achieve the same functional results by following the pattern of this design. >>A minimal amount of disk access required to ensure that >>processing and financial transactions are handled >>correctly >>must be acceptable. > **** > This is fairly inconsistent with your non-negotiable "no > disk accesses" policy where you You are confusing two distinctly different issues. (1) OS paging out data when huge surpluses of RAM are available. (2) A possible implementation strategy for a database provider. > even insisted that there was no need for a database to hit > the disk if the database was < > 100K bytes (or did you forget that you had stated that > requirement?). Oh, never mind, I > just realized this is covered by the Magic Morphing > Requirements pattern. > joe > **** >> > Joseph M. Newcomer [MVP] > email: newcomer(a)flounder.com > Web: http://www.flounder.com > MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Joseph M. Newcomer on 6 Apr 2010 21:32 See below,,. On Tue, 6 Apr 2010 16:59:13 -0500, "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote: > >"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in >message news:2atmr51ml9kn4bb5l5j77h3lpiqtnlq8m3(a)4ax.com... >> See below... >> On Mon, 5 Apr 2010 21:32:44 -0500, "Peter Olcott" >> <NoSpam(a)OCR4Screen.com> wrote: >> >>>Ah but, then you are ignoring the proposed aspect of my >>>design that would handle all those things. What did you >>>call >>>it "mirrored transactions". I called it on-the-fly >>>transaction-by-transaction offsite backup. >> **** >> If you have a "proposed aspect" I presume you have >> examined the budget numbers for actual >> dollars required to achieve this, and the complexity of >> making sure it works right. >> >> I am not ignoring the issue, I'm asking if you have >> ignored the realities involved in >> achieving it! > >I would simply re-implement some of the aspects of my web >application such that there is another web application on >another server that the first server can send its >transactions to. **** Ohh, the Magical Mechanism solution! Of course, this adds time, complexity, and cost, but what do they matter? Maybe you could talk to your ISP about "load balancing" among multiple servers? They've already got this working! At least most ISPs that plan to survive have it working already. ***** > >>>I don't want to ever lose any data pertaining to customers >>>adding money to their account. I don't want to have to >>>rely >>>on the payment processor keeping track of this. Maybe >>>there >>>are already mechanisms in place that can be completely >>>relied upon for this. >> **** >> If a customer can add $1 and you spend $5 making sure they >> don't lose it, have you won? > >If you don't make sure that you don't lose the customer's >money your reputation will put your out of business. If you >can't afford to make sure that you won't lose the customer's >money then you can't afford to go into business. ***** Yes, but you have to make sure the mechanisms you create to do this are cost-effective. See my earlier comments about UPS and FedEx not requiring "live signatures" for most deliveries! Sometimes, you lef your "insurance" pay this, and sometimes, you become your own insurer (this is called, technically, being "self-insured"). joe ***** > Joseph M. Newcomer [MVP] email: newcomer(a)flounder.com Web: http://www.flounder.com MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Hector Santos on 6 Apr 2010 22:47 Peter Olcott wrote: >> In general, as long as you open for append, write and >> close, and do leave it open, don't use any files stat >> readings or seeking on your own, it works very nicely: > > I need to have the file opened for append by one process, > opened for read/write for another process, can't I just keep > it open? You could, but you could have coherency issues. If you use the std I/O RTL (run time library), at least under Windows, you also have caching for writing, not just for reading. see setbuf(); If you keep it open, it will flush when X number bytes is written to the stream buffer. It is not a "shared" stream, but the writing will take place at the end of the file when its finally flushed explicitly or implicitly. That writing will take place at the POS at the MOMENT it takes place. So you can see "order" issues under multi-threads depending on how things are "flushed". You wouldn't get corruption, but order is probably is the main thing. In addition, your reader might not see the request because it wasn't flush. To avoid this, you have to either open the file in commit mode (No buffering), or set SetBuf(fv,NULL), so there is some speed lost there. Plus what happens if the machine crashes? You lost what hasn't been flush. The different is you lower any issues at the expense at some speed and open/close overhead which will be very minor for you. > If I do close it like you suggest, will it being opened by > one process prevent it from being opened by another? No, because fopen("filename","at"), opens the file in: FOR YOU: GENERIC_WRITE FOR OTHERS: FILE_SHARE_READ | FILE_SHARE_WRITE > It seems like one process could append and another one > read/write without interfering with each other. See CreateFile() GENERIC_XXXXXX and FILE_SHARE_xxxxxx. The std I/O RTL functions all use the WIN32 functions and the std I/O open/access attributes are mapped to CreateFile() open/access attributes. >> However, if you really wanted a guarantee, then you can >> user a critical section, a named kernel object (named so >> it can be shared among processes), or use sharing mode >> open file functions with a READ ONLY sharing attribute. >> Using CreateFile(), it would look like this: > > It would be simpler to bypass the need of this and simply > delegate writing the transaction log file to a single > thread. It helps to have a single point I/O controller, but how are you planning to use this thread? How will you talk to it? IOW, now you really need to make sure you have synchronization. > Also if the OS already guarantees that append is atomic why > slow things down uncessarility? Using the append as I suggested works, simpler for your needs. You don't have any special needs that this will prevent you from doing. As indicated before, you are MUCH slower than the computer. You can keep the file open, but since you will need to make it non-buffering anyway, you are not going to lost much at all as I shown it. But as long as you make it non-buffered it should work fine. >> If that is all *nix has to offer, historically, using >> named pipes can be unreliable, especially under multiple >> threads. > > There are several different types of IPC, I chose the named > pipe because it is inherently a FIFO queue. So it every other IPC concept. For your need, named pipes is more complex and can be unreliable and very touchy if you don't do it right. I mean, your I/O needs to be 100% precise and that can't be done in less than 20-30 lines of code, and for what you need, 3-4 lines code is sufficient. Unless you get Named Pipe class that will do all the work, error checking, like error 5/32 sharing violation timings, etc, exceptions, proper full duplex communications, you can certainly run into a ugly mess. I don't recommend it for you. You don't need it. >> But since you continue to mix up your engineering designs >> and you need to get that straight, process vs threads, the >> decision will decide what to use. > > The web server will be a process with one thread per HTTP > request. The OCR will be a process with at least one thread. > I may have multiple threads for differing priorities and > have the higher priority thread preempt the lower ones, such > that only one thread is running at a time. You are certainly a character. You want complexity, yet not willing to follow standard design practices. What about your HTTP request and response model? Does the above incorporate a store and forward concept? Meaning, don't forgot that you have a RESPONSE to provide. You just can't ignore it. You have to at least respond with: "This will take a long time, we will email you when done." If you are going to add the complexity to change thread priorities, etc, just make the damn OCR process multi-thread ready like this DESIGN is crying for and stop the bull$hit already. >> While there are methods to do cross machine MESSAGING, >> like named pipes, it is still fundamentally based on a >> file concept behind the scenes, they are just "special >> files". > > The processes are on the same machine. Apparently this > "file" is not a "disk" file, everything occurs in memory. Start ProcessExplorer or FileMon and open your named pipe and BEHOLD! Look it is not a "Disk" file like you thinking. It is a FILE HANDLE like the rest which is why you can use the ReadFile() WriteFile() and treat it like a kernel object. But is is MEMORY and you learned your lessons on Virtual Memory. It is a MEMORY MAP! Its like the System Change Journal File. You don't see it as a file. And guess what? under UNIX it is a file or "volatile names(freed after the last reference to them is closed) allocated in the root directory of the named pipe filesystem (NPFS)..." See http://en.wikipedia.org/wiki/Named_pipe > At this early stage of my learning process I also need to > get physical so that I better understand what kinds of > things are feasible, and the degree of difficulty in > implementing the various approaches. Unless you roll up your sleeves, you will still be at this for another 10 years. >> WARNING: >> >> One thing to remember is that DBA (Database Admins) value >> their work and are highly paid. Do not argue or dispute >> with them as you > > I did non SQL database programming for a decade. Then do that! >> Well, to do that you have no choice but to implement your >> own file sharing class as shown above. The concept is >> basically a Log Rotater. >> You can now update the CRequestHandlerAbstract class with >> one more method requirement: > > I am not sure if that is true. One process appends to the > file. Another process uses pread() and pwrite() to read and > write to the file. These are supposed to be guaranteed to be > atomic, which I am taking to mean that the OS forces them to > occur sequentially. Ok, go get yourself a Btree ISAM database library! That will giving your fast database needs with all the controls you want. You see you did non-SQL database work for 10 years. Then you should not have a problem here. BTW, a btree/isam database system is what we use in our high end multi-threaded RPC server but its coupled with memory mapping technology to speed up large file I/O process. The speed is 2nd to none. The only thing that is been done now is to make it 64 bit I/O, not 64 bit compiled but 64 bit read/write. -- HLS
From: Liviu on 7 Apr 2010 00:03 "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote... >>> >>> http://en.wikipedia.org/wiki/Sparse_matrix >>> To be unambiguously distinguished from: >>> >>> In the subfield of numerical analysis, a sparse matrix is a >>> matrix populated primarily with zeros (Stoer & Bulirsch >>> 2002, p. 619). The term itself was coined by Harry M. >>> Markowitz. > > At the time that I wrote the above patent it seemed that the computer > science definition of the term was comparable to the numerical > analysis definition. This same definition seems to predominate the use > of the term when the term [sparse matrix] is searched on google. Excellent. Let wikipedia be the reference, and google be the judge. > I hung out in the misc.int.property group while I was prosecuting my > patent. One of the things that I found there was that a patent has to > be so clear that even a great effort to intentionally misconstrue its > meaning will fail. I tried to let this standard be my guide. And the best you came up with in terms of clarity was this? || The term "Sparse Matrix" is taken to have the common meaning || of the term "Sparse" combined with the common computer science || meaning of the term "Matrix", a two dimensional array of elements. Brilliant. Though it might leave the door open to competing patents which could define their own notion of "sparse matrix" as, for example, "taken to have the mathematical meaning of 'sparse' combined with the pop culture meaning of 'matrix'". Liviu
From: Peter Olcott on 7 Apr 2010 10:57
"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message news:tginr5tncrmn9hdblc0barqlkniklfkr4m(a)4ax.com... > See bekiw,,, > On Tue, 6 Apr 2010 16:51:26 -0500, "Peter Olcott" > <NoSpam(a)OCR4Screen.com> wrote: > >> > We used a RDBMS to manage even logs in several products, > because the event log had to not > only record eventas, but record when they were processed > by the operator, and NOBODY even > SUGGESTED that the RDBMS was "too expensive"; in fact, we > never cared in the slightest. > One site required we handle 400 events/minute, and by > writing a simulator, I flooded the > network and peaked out at 1300 events/minute, with MANY > database accesses per event. So, > somehow, you are absolutely convinced, with ZERO evidence, > that SQLLITE is not going to > handle your problem! 100 events per second, 6,000 per minute. In this case one disk access per even is better than several, and one data file compared to two (data + index) would tend to be more reliable. The 6,000 per minute is on the single core machine, and even on this machine might double to 12,000 per minute because hyperthreading is available. We get to 48,000 events per minute on the quad-core box. > > I grade this with an "F" and send it back to the student > to redesign. I realize you may > support the teaching of Evolution, but in this case just a > *little bit* of Intelligent > Design is appropriate. I see none whatsoever here. > > You want to use SQL for a FIFO but not for a log? Oh, > seriously, Give Me A Break! > **** >> >>It seems like one process could append and another one >>read/write without interfering with each other. > ***** > OMG! You think you can completely design a system that is > transactionally safe, from > scratch, without any effort or incurring any peformance > penalty! The only big performance penalty is the forced flush to disk. >>It would be simpler to bypass the need of this and simply >>delegate writing the transaction log file to a single >>thread. > **** > Sure, why not? Sounds like a sound design to me! And is > probably the way I would do it, > because that's what I already do! Good we agree on something. > **** >>Also if the OS already guarantees that append is atomic >>why >>slow things down uncessarility? > **** > It does? Only in your fevered imagination! Do you KNOW > what is require to make pwrite > work correctly in the case of multiple concurrent threads > trying to pwrite? No, you read > one sentence about pwrite being atomic and think you > understand the problem! I take this to mean simply that the OS forms an internal queue of all pwrite() requests to the same file, so then this would be exactly what it takes. > > Sorry, I've actually done this, and it is NOT > straightforward. For example, what is the > byte offset? How do you determine it? What are the race > conditions if two threads wnat > to append to the same file at the same time? Ohh, but > don't worry, pwrite is "atomic". So > by some magical handwave, all problems are inherently > solved! As long as the OS sequences the appends, there is no race condition, they merely occur in sequence. In this case the OS itself becomes the single thread. > > If this is so easy, write the code that does it, and I > will probably take less than 5 > minutes (more likely less than 1 minute, because I know > exactly what mistakes to look > for!) to demonstrate why it cannot possibly work reliably! > The atomicity of pwrite will > not actually be a factor! fseek/write will work just as > well (or as badly). Can you explain the specific scenario where sequencing does not solve this problem? >>> CloseHandle(h); > **** > Actually, the CloseHandle is the secret that makes this > work. By doing a CloseHandle, the > file size is forced to include the appended data. Note > that this whole thing is not > transactionally safe, since the CloseHandle does NOT > guarantee the buffers are flushed to > the disk, It is easy to force the buffers to be flushed to disk even without closing the file. > and in fact they probably won't be, so the next append > will append to the file > system cache in memory as well. In fact, we don't even > know if, on a crash, the directory > blocks have been properly updated on the disk, since > CloseHandle does not require that for > correctness. But at least this handles the issues of > conflicts between threads (badly, > insofar as perofmrnace, forcing n * 60ms delay on some > transactions, but we believe > performance no longer matters anyway!), even if it doesn't > guarantee that the log is > intact! > ***** >>The web server will be a process with one thread per HTTP >>request. The OCR will be a process with at least one >>thread. >>I may have multiple threads for differing priorities and >>have the higher priority thread preempt the lower ones, >>such >>that only one thread is running at a time. > ***** > Ohh, let's all shout it out: PRIORITY INVERSION! I've > never seen anyone who wanted "real The simple way around this is to simply use multiple processes that all have their own data. My code is so tight it takes almost no room at all, and now that I optimized the DFA for space, its requirements are also very small. No shared resource, no priority inversion! The data load time that I was so worried about previously now with the new design becomes completely moot. > time" responses screw up so many fundamental ideas about > real time before, and this just > adds to the list. Mucking around with thread priorities > is always a VERY dangerous game > in ANY operating system, but especially in WIndows, and to > only slightly lesser degreee in > linux. Quick, without looking it up on google, somebody > tell me about the 'nice' command > and its reason for existence, and why nobody actually ever > used it.... > > And how, exactly, do you plan to implement this concept > that the higher-priority thread > will preempt the lower-priority thread? Oh, yes, let the > scheduler do it. At which > point, the concept of "only one thread is running at a > time" is NOT under your control, > and in fact, is a really stupid statement to make, since > you have no way to enforce it, Yes I do have an easy way to enforce it. I will let you figure out your own false assumptions on this one. The first most significant false assumption is that I am constrained by the limits of the OS scheduler. > not on WIndows, and not on linux. And if you bring the > SuspendThread API into the > discussion, it will prove you are totally clueless. One > way to always tell an amateur is > they think they can explicitly suspend other threads and > end up with a system that even > pretends to work. It won't. And any other approach (such > as WF[SM]O) is probably doomed, > because it imposes massive overheads without gaining any > benefits. > > And, you can look this up, tell me about linux's > anti-starvation guarantees, and what a > really cool scheduler it has (actually, its scheduler > sucks). As a start, to see how bad > the scheduler is, see http://lwn.net/Articles/176635/. > When I first read this, I thought > it was, perhaps, a joke. Sadly, it is not. > **** No you guessed wrong on both of these. >>At this early stage of my learning process I also need to >>get physical so that I better understand what kinds of >>things are feasible, and the degree of difficulty in >>implementing the various approaches. > ***** > First, get the design right. Once you have the design > right, this tells you what > implementation techniques become feasible. Then, build > some prototypes using some of This limits the options too much. Going back and forth between abstract design and concrete implementation provides more options and thus more optimal design. > these techniques, and MEASURE THE HELL out of them. How > many transactions per second can On those things that are unknown this is good advice, on those things that can be derived by analysis this is no so good advice. > you achieve with a SQL-based log? Is it sufficient for > your needs? If so, STOP trying to > do some kind of "optimum" design and use what WORKS. I > saturated our system at 1300 I might expect 48,000 transactions per minute. If it takes the same or less effort to make the fastest possible design, then this is the way that I go. Only when this takes substantially more time do I consider testing to see if the quick to implement alternative design is good enough. I know that I can build the fastest log file design from scratch in much less time than it would take me to implement in SQL. > I'd build an echo-server than took the user input and > wrote it right back. I'd try a > simple FIFO queue and a simple SQL database to handle the > transaction billing, and test it The transaction log file is my FIFO queue. Hector is proposing a similar design. > like crazy, both by trying to saturate its peformance (to > see when it "fell over") and by > simulating "catastrophic failure" by setting breakpoints > at critical places and applying > forced failure modes (in the case of Unix "kill -9") when > those breakpoints were reached. > Then I'd know the correctness. And I'd know where to set > the breakpoints from my state > diagram in the specifications document. I would not worry > about anything else until I had > demonstrated that the simple implementation failed to meet > the performance requirements. > > joe Yes I will do that after I implement the design. |