- Potential performance bottleneck for Linxu TCP [Kernel]

Prev: Add IDE mode support for SB600 SATA
Next: 2.6 driver for Silan SC92031 (second try)

From: Ingo Molnar on 30 Nov 2006 15:40

* David Miller <davem(a)davemloft.net> wrote:

> I want to point out something which is slightly misleading about this
> kind of analysis.
>
> Your disk I/O speed doesn't go down by a factor of 10 just because 9
> other non disk I/O tasks are running, yet for TCP that's seemingly OK
> :-)

disk I/O is typically not CPU bound, and i believe these TCP tests /are/
CPU-bound. Otherwise there would be no expiry of the timeslice to begin
with and the TCP receiver task would always be boosted to 'interactive'
status by the scheduler and would happily chug along at 500 mbits ...

(and i grant you, if a disk IO test is 20% CPU bound in process context
and system load is 10, then the scheduler will throttle that task quite
effectively.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: David Miller on 30 Nov 2006 15:40

From: Ingo Molnar <mingo(a)elte.hu>
Date: Thu, 30 Nov 2006 21:30:26 +0100

> disk I/O is typically not CPU bound, and i believe these TCP tests /are/
> CPU-bound. Otherwise there would be no expiry of the timeslice to begin
> with and the TCP receiver task would always be boosted to 'interactive'
> status by the scheduler and would happily chug along at 500 mbits ...

It's about the prioritization of the work.

If all disk I/O were shut off and frozen while we copy file
data into userspace, you'd see the same problem for disk I/O.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Wenji Wu on 30 Nov 2006 15:50

> It steals timeslices from other processes to complete tcp_recvmsg()
> task, and only when it does it for too long, it will be preempted.
> Processing backlog queue on behalf of need_resched() will break
> fairness too - processing itself can take a lot of time, so process
> can be scheduled away in that part too.

It does steal timeslices from other processes to complete tcp_recvmsg()
task. But I do not think it will take long. When processing backlog, the
processed packets will go to the receive buffer, the TCP flow control will
take effect to slow down the sender.

The data receiving process might be preempted by higher priority processes.
Only the data recieving process stays in the active array, the problem is
not that bad because the process might resume its execution soon. The worst
case is that it expires and is moved to the active array with packets within
the backlog queue.

wenji

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Ingo Molnar on 30 Nov 2006 15:50

* David Miller <davem(a)davemloft.net> wrote:

> > disk I/O is typically not CPU bound, and i believe these TCP tests
> > /are/ CPU-bound. Otherwise there would be no expiry of the timeslice
> > to begin with and the TCP receiver task would always be boosted to
> > 'interactive' status by the scheduler and would happily chug along
> > at 500 mbits ...
>
> It's about the prioritization of the work.
>
> If all disk I/O were shut off and frozen while we copy file data into
> userspace, you'd see the same problem for disk I/O.

well, it's an issue of how much processing is done in non-prioritized
contexts. TCP is a bit more sensitive to process context being throttled
- but disk I/O is not immune either: if nothing submits new IO, or if
the task does shorts reads+writes then any process level throttling
immediately shows up in IO throughput.

but in the general sense it is /unfair/ that certain processing such as
disk and network IO can get a disproportionate amount of CPU time from
the system - just because they happen to have some of their processing
in IRQ and softirq context (which is essentially prioritized to
SCHED_FIFO 100). A system can easily spend 80% CPU time in softirq
context. (and that is easily visible in something like an -rt kernel
where various softirq contexts are separate threads and you can see 30%
net-rx and 20% net-tx CPU utilization in 'top'). How is this kind of
processing different from purely process-context based subsystems?

so i agree with you that by tweaking the TCP stack to be less sensitive
to process throttling you /will/ improve the relative performance of the
TCP receiver task - but in general system design and scheduler design
terms it's not a win.

i'd also agree with the notion that the current 'throttling' of process
contexts can be abrupt and uncooperative, and hence the TCP stack could
get more out of the same amount of CPU time if it used it in a smarter
way. As i pointed it out in the first mail i'd support the TCP stack
getting the ability to query how much timeslices it has - or even the
scheduler notifying the TCP stack via some downcall if
current->timeslice reaches 1 (or something like that).

So i dont support the scheme proposed here, the blatant bending of the
priority scale towards the TCP workload. Instead what i'd like to see is
more TCP performance (and a nicer over-the-wire behavior - no
retransmits for example) /with the same 10% CPU time used/. Are we in
rough agreement?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Ingo Molnar on 30 Nov 2006 16:00

* Ingo Molnar <mingo(a)elte.hu> wrote:

> [...] Instead what i'd like to see is more TCP performance (and a
> nicer over-the-wire behavior - no retransmits for example) /with the
> same 10% CPU time used/. Are we in rough agreement?

put in another way: i'd like to see the "TCP bytes transferred per CPU
time spent by the TCP stack" ratio to be maximized in a load-independent
way (part of which is the sender host too: to not cause unnecessary
retransmits is important as well). In a high-load scenario this means
that any measure that purely improves TCP throughput by giving it more
cycles is not a real improvement. So the focus should be on throttling
intelligently and without causing extra work on the sender side either -
not on trying to circumvent throttling measures.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7
Prev: Add IDE mode support for SB600 SATA
Next: 2.6 driver for Silan SC92031 (second try)