Can extra processing threads help in this case? [MFC]

Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system

From: Peter Olcott on 12 Apr 2010 16:05

"Jerry Coffin" <jerryvcoffin(a)yahoo.com> wrote in message
news:MPG.262d1770adba771989867(a)news.sunsite.dk...
> In article
> <jdSdnYVeeN8mq17WnZ2dnUVZ_qidnZ2d(a)giganews.com>,
> NoSpam(a)OCR4Screen.com says...
>
> The scheduling algorithm does NOT boil down to essentially
> (or even
> remotely) the frequency and/or duration of time slices. It
> happens as
> I already described: the highest priority tasks get
> (essentially) all
> the processor time. Like Windows, Linux does have a
> starvation
> prevention mechanism, but 1) it basically works in
> opposition to the
> priority mechanism, and 2) it only redistributes a small
> percentage
> of processor time, not anywhere close to the 20% you're
> looking for.

That sure sounds screwy to me. Of the 40 different priority
levels available on Linux, a process with priority of 0
would starve a process with priority of 1? That sure sounds
screwy to me. Can you prove this?

>> I see no other way to provide absolute priority to the
>> high
>> priority jobs (paying customers) over the low priority
>> jobs
>> (free users). Also I see no way that this would not work
>> well. If I get enough high priority jobs that the lower
>> priority jobs never ever get a chance to run that would
>> be
>> fantastic. The whole purpose of the free jobs is to get
>> more
>> paying jobs.
>
> I see Joe has already commented on the technical aspects
> of this, so
> I won't bother. I'll just add that if you think delaying a
> free job
> indefinitely is going to convince somebody to pay for your
> service,
> your understanding of psychology is even more flawed than
> your
> understanding of operating systems.

The ONLY purpose of the free jobs is to get paying jobs. The
only way that a free job would never get done is it my
website is earning $10 per second 24/7/365. $864,000 per
day. Long before that ever happens I will set up a cluster
of servers just for the free jobs.

>
>> If you see something specifically wrong with this
>> approach
>> please point out the specific dysfunctional aspect. I see
>> no
>> possible dysfunctional aspects with this design.
>
> Perfect designs are sufficiently rare that if you see no
> possible
> dysfunctional aspects to a design, it's essentially proof
> positive
> that you don't understand the design.

What I am saying is that telling me that it is bad without
telling me what is bad about it is far worse than useless.
In more than half of the cases now what was bad about my
design was not the design itself but the misconception of
it. Without explaining why you think it is bad, and only
saying that it is bad is really harassment and not helpful.

>> Block IP long before that.
>
> That has (at least) two serious problems. First of all,
> for a DoS
> attack, the sender doesn't care about receiving replies
> (in fact,
> doesn't *want* to receive replies) so he'll normally
> generate each
> packet with a unique IP address in the "From" field.
>
> Second, there are distributed denial of service attacks
> that
> (typically) use "botnets" of machines that have been
> infected with
> malware that allows the botnet operator to control them.
> The Mariposa
> botnet (recently shut down, at least partially, when
> Spanish law
> enforcement arrested three operators) controlled machines
> using over
> 11 million unique IP addresses.
>
> --
> Later,
> Jerry.

So what else can be done, nothing?

From: Hector Santos on 12 Apr 2010 16:05

Joseph M. Newcomer wrote:

> See below...
>> Joe, at least so far we got him to:
>>
>> - Admit to lack of understanding of memory and he himself reduced
>> the loading requirement rather than code for any large memory
>> efficiency methods.
>>
>> - Admit that his 100 TPS was unrealistic for a 10 ms throughput
>> that lacked consideration for the interfacing processing time
>> outside the vapor ware OCR processor. So he added another
>> 10 ms and reduced the TPS now to 50.
>>
>> Joe, I don't know about you, but I still got a few tooth left to be
>> pulled! :)
> ***
> I blame it on OCD. Some people with OCD keep washing their hands to get them clean; I
> keep returning here to see if we can educate Peter. I probably need therapy to stop me
> from trying to help him, since he clearly has all the answers and I'm wasting my time.

> joe

> ****

I'm going to have to go get a white chip soon at Programmer Anonymous. :)

--
HLS

From: Joseph M. Newcomer on 12 Apr 2010 16:08

See below...
On Sat, 10 Apr 2010 08:47:00 -0600, Jerry Coffin <jerryvcoffin(a)yahoo.com> wrote:

>In article <2IydnTUFcvDQ2CLWnZ2dnUVZ_rmdnZ2d(a)giganews.com>,
>NoSpam(a)OCR4Screen.com says...
>
>[ ... ]
>
>> I am going to use four processes for the four different
>> levels of process priority. The High priority jobs will be
>> assigned something like 80% of the CPU (relative to the low
>> priority jobs) and the low priority jobs will each get
>> something like 7% of the CPU. Each of these processes will
>> pull jobs from each of four FIFO queues. Some sort of port
>> access might be better than a named pipe because each port
>> access could be explicitly acknowledged when it occurs.
>> Shoving something in a named pipe does not provide this
>> benefit.
>
>This is a really bad idea. What you almost certainly want is a
>priority queue to hold the incoming tasks, with the "priority" for
>the queue based on a combination of the input priority and the
>arrival time. This will let a new input move ahead of some lower
>priority items as long as they arrived recently enough (for some
>definition of "recently enough") but also guarantees that a low
>priority task won't sit in the queue indefinitely -- at some point,
>it'll get to the front of the queue and be processed.
>
>This is simple, straightforward to implement, and fairly easy to be
>sure it works correctly. Despite it's *seeming* simplicity, the
>method you've outlined is none of the above -- quite the contrary,
>it's a recipe for putting another 10 years (or more) of work into
>getting your process synchronization to work.
****
Note that THREE people with real experience in building complex systems have now told you
your design is wrong. You are 1 of 4 votes claiming it is good. Has it occurred to you
that if several independent experts agree the design is bad, that just maybe, it is bad?
joe
****
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Joseph M. Newcomer on 12 Apr 2010 16:21

See below...
On Sat, 10 Apr 2010 23:14:13 -0600, Jerry Coffin <jerryvcoffin(a)yahoo.com> wrote:

>In article <0L6dnTk7cJuaulzWnZ2dnUVZ_u6dnZ2d(a)giganews.com>,
>NoSpam(a)OCR4Screen.com says...
>
>[ ... ]
>
>> A Linux/Unix expert David Schwartz says that it is nearly
>> impossible to avoid all kinds of hidden dependencies that
>> you have no control of when using threads.
>
>Threads under Linux/Unix are rather a different beast than under
>Windows. In particular, the basic Unix model of process creation
>interacts poorly with threads (to put it mildly). The problem is that
>Unix creates a new process as a clone of an existing process. With a
>single thread, this is pretty easy -- but with multiple threads, it
>gets ugly. One thread has clearly just called fork() -- but if (for
>example) another thread has initiated a write to disk that has not
>yet completed, you can't really just clone the whole state of the
>existing process into a new one.
****
Nobody remembers that the reason fork() exists is because it was impossible to start a
second process on a PDP-11/20 (no VM, for example), and this was the only way that could
work. Every operating system before or since has had the equivalent of the Windows
CreateProcess call, which solves all of these horrible problems. fork() is the world's
worst way to create a process, for many of the reasons you cite above. But unixoids think
that whatever Unix does must be Perfection Itself, no matter how bad it has proven to be.
joe
****
>
>For better or worse, the thread creation API in Linux is basically
>just a variation of fork() -- basically, they have a set of bit-flags
>that indicate what to keep from the existing process, and what to
>create new. This means the same problems that can arise with creating
>a new multi-threaded process under Unix can also arise with creating
>new threads in the same process. Getting it to work at all took
>heroic effort, and even with that it doesn't really work well (and
>don't expect much improvement in that respect anytime soon either).
>
>Bottom line: since you're posting in a Windows-specific newsgroup,
>most of us have posted general designs that are oriented toward
>Windows. If you're designing this for Linux/Unix, then you almost
>certainly should use only one thread per process. Contrary to (at
>least the implication of) the statement above, however, the problem
>isn't really with threads themselves -- it's with how Unix (and
>especially Linux) has mis-implemented them.
****
I find the need to discuss linux details on an MFC board more than a little weird. I give
Windows-based answers, because that's the only operating system that matters to me.
****
>
>> For my purposes
>> with the redesigned OCR process there is no functional
>> difference between threads and processes because there is no
>> longer any need to share data.
>
>You seem to be having difficulty making up your mind. On one hand,
>you talk about gigabytes of static data, but on the other hand, you
>say there's no need to share data. One of these two is almost
>certainly false or at least misleading -- while sharing that static
>data may not be strictly *needed*, it would clearly be extremely
>desirable.
****
Something about "false assumptions" comes to mind here...
joe
****
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Peter Olcott on 12 Apr 2010 16:41

"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
message news:mfj6s5lj3bqji65f0cnbreq98utl7m11oc(a)4ax.com...
> See below...
> On Sun, 11 Apr 2010 23:18:07 -0600, Jerry Coffin
> <jerryvcoffin(a)yahoo.com> wrote:
>
> Sadly, you have to realize that schedulers work like he
> *imagines*, not like any real
> scheduler we have ever seen! So he thinks that schedulers
> worry about percentages of CPU
> time when no scheduler in history has EVER worked this
> way. But what does reality matter
> in these designs?

OK so it was apparently more of my naivety. It is far better
that it works the way that you said, that is more of what I
need. I will bump the primary OCR process and the web server
up one level of priority and I am good to go. The
alternative design can be scrapped.

> But he won't listen when we tell him these things. This
> is not how Peter's Fantasy
> Operating System would run, so it doesn't matter!

Whenever you explain the details I will listen.

I will assume that the above details regarding the scheduler
are correct. So all remaining discussion on option TWO will
be ignored because it really looks like option TWO is now
moot.

The design remains the same as it has been for quite a
while. The Option for priortization of the four job types
becomes assigning a process priority of -1 to both the web
server and the high priority OCR process. The other option
of handling scheduling myself is discarded.

There are still four OCR processes each with their own
queue. There are still multiple threads one for each HTTP
connection. SQLite is still the database provider. Either
SQLite or a file will handle the transaction log, depending
upon the measured performance of SQLite on this aspect.

The choice of IPC between the web server and the OCR
processes (how the queues are implemented) is still open,
but still leaning towards named pipes. I/O completion ports
have been discarded because they are not available under
Linux.

First | Prev | Next | Last
Pages: 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143
Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system