sched: implement __set_cpus_allowed() [Kernel]

Prev: [tip:x86/urgent] x86/mm: Remove unused DBG() macro
Next: sched: consult online mask instead of active in select_fallback_rq()

From: Peter Zijlstra on 31 May 2010 04:10

On Thu, 2010-05-13 at 12:48 +0200, Tejun Heo wrote:
> Concurrency managed workqueue needs to be able to migrate tasks to a
> cpu which is online but !active for the following two purposes.
>
> p1. To guarantee forward progress during cpu down sequence. Each
> workqueue which could be depended upon during memory allocation
> has an emergency worker task which is summoned when a pending work
> on such workqueue can't be serviced immediately. cpu hotplug
> callbacks expect workqueues to work during cpu down sequence
> (usually so that they can flush them), so, to guarantee forward
> progress, it should be possible to summon emergency workers to
> !active but online cpus.

If we do the thing suggested in the previous patch, that is move
clearing active and rebuilding the sched domains until right after
DOWN_PREPARE, this goes away, right?

> p2. To migrate back unbound workers when a cpu comes back online.
> When a cpu goes down, existing workers are unbound from the cpu
> and allowed to run on other cpus if there still are pending or
> running works. If the cpu comes back online while those workers
> are still around, those workers are migrated back and re-bound to
> the cpu. This isn't strictly required for correctness as long as
> those unbound workers don't execute works which are newly
> scheduled after the cpu comes back online; however, migrating back
> the workers has the advantage of making the behavior more
> consistent thus avoiding surprises which are difficult to expect
> and reproduce, and being actually cleaner and easier to implement.

I still don't like this much, if you mark these tasks to simply die when
the queue is exhausted, and flush the queue explicitly on
CPU_UP_PREPARE, you should never need to do this.

After which I think you don't need this patch anymore..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Tejun Heo on 31 May 2010 06:00

Hello,

On 05/31/2010 10:01 AM, Peter Zijlstra wrote:
> On Thu, 2010-05-13 at 12:48 +0200, Tejun Heo wrote:
>> Concurrency managed workqueue needs to be able to migrate tasks to a
>> cpu which is online but !active for the following two purposes.
>>
>> p1. To guarantee forward progress during cpu down sequence. Each
>> workqueue which could be depended upon during memory allocation
>> has an emergency worker task which is summoned when a pending work
>> on such workqueue can't be serviced immediately. cpu hotplug
>> callbacks expect workqueues to work during cpu down sequence
>> (usually so that they can flush them), so, to guarantee forward
>> progress, it should be possible to summon emergency workers to
>> !active but online cpus.
>
> If we do the thing suggested in the previous patch, that is move
> clearing active and rebuilding the sched domains until right after
> DOWN_PREPARE, this goes away, right?

Hmmm... yeah, if the usual set_cpus_allowed_ptr() keeps working
throughout CPU_DOWN_PREPARE, this probably goes away. I'll give it a
shot.

>> p2. To migrate back unbound workers when a cpu comes back online.
>> When a cpu goes down, existing workers are unbound from the cpu
>> and allowed to run on other cpus if there still are pending or
>> running works. If the cpu comes back online while those workers
>> are still around, those workers are migrated back and re-bound to
>> the cpu. This isn't strictly required for correctness as long as
>> those unbound workers don't execute works which are newly
>> scheduled after the cpu comes back online; however, migrating back
>> the workers has the advantage of making the behavior more
>> consistent thus avoiding surprises which are difficult to expect
>> and reproduce, and being actually cleaner and easier to implement.
>
> I still don't like this much, if you mark these tasks to simply die when
> the queue is exhausted, and flush the queue explicitly on
> CPU_UP_PREPARE, you should never need to do this.

I don't think flushing from CPU_UP_PREPARE would be a good idea.
There is no guarantee that those works will finish in short (human
scale) time, but we can update cpu_active mask before other
CPU_UP_PREPARE notifiers are executed so that it's symmetrical to cpu
down path and then this problem goes away the exact same way, right?

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Peter Zijlstra on 31 May 2010 06:10

On Mon, 2010-05-31 at 11:55 +0200, Tejun Heo wrote:
> I don't think flushing from CPU_UP_PREPARE would be a good idea.
> There is no guarantee that those works will finish in short (human
> scale) time,

If not, we should cure it by ensuring these works won't be left
lingering I think.

Also, I really don't much care about how fast we can hotplug cycle
things -- its an utter slow path.

> but we can update cpu_active mask before other
> CPU_UP_PREPARE notifiers are executed so that it's symmetrical to cpu
> down path and then this problem goes away the exact same way, right?

Ah, no, we cannot mark it active before its actually up, because at that
time we'll actually try and run stuff on it, which clearly won't work
when its not there to run stuff.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Tejun Heo on 31 May 2010 06:10

On 05/31/2010 12:02 PM, Peter Zijlstra wrote:
> On Mon, 2010-05-31 at 12:01 +0200, Peter Zijlstra wrote:
>>> but we can update cpu_active mask before other
>>> CPU_UP_PREPARE notifiers are executed so that it's symmetrical to cpu
>>> down path and then this problem goes away the exact same way, right?
>>
>> Ah, no, we cannot mark it active before its actually up, because at that
>> time we'll actually try and run stuff on it, which clearly won't work
>> when its not there to run stuff.
>
> So we should clear it _before_ we go down, and set it _after_ we're up.
>

Yeah, sure, I misspoke. I meant CPU_ONLINE not CPU_UP_PREPARE. So,
we can mark a cpu active before other CPU_ONLINE callbacks are run.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Tejun Heo on 31 May 2010 06:20

Hello,

On 05/31/2010 12:15 PM, Peter Zijlstra wrote:
> On Mon, 2010-05-31 at 12:06 +0200, Tejun Heo wrote:
>> I meant CPU_ONLINE not CPU_UP_PREPARE. So,
>> we can mark a cpu active before other CPU_ONLINE callbacks are run.
>
> OK, that should work.

The only remaining problemw is PF_THREAD_BOUND. It needs to be set
for wq workers (including rescuers). Would it be okay to add an
interface which only ignores PF_THREAD_BOUND?

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

| Next | Last
Pages: 1 2
Prev: [tip:x86/urgent] x86/mm: Remove unused DBG() macro
Next: sched: consult online mask instead of active in select_fallback_rq()