Prev: [PATCH v4] block: avoid unconditionally freeing previously allocated request_queue
Next: serial: bfin_5xx: IRDA is not affected by anomaly 05000230
From: Vladislav Bolkhovitin on 3 Jun 2010 13:50 Boaz Harrosh, on 06/03/2010 08:09 PM wrote: > [Topic] > How to not let pages change while in IO > > [Abstract] > As seen in a long thread on the fsdvel scsi mailing lists. Lots of > people have headaches and sleep less nights because individual pages > can change while in IO and/or DMA. Though each one as slightly different > needs, the mechanics look to be the same. > > People that care: > - Mirror and RAID people that need on disk consistency. > - Network storage that wants data checksum. > - DIF/DIX people - Load balancing MPIO clusters, where out of order execution of overlapping write requests for the changed pages can introduce a data corruption, which makes using Linux with load balancing MPIO clusters unsafe. > - ... > > I for one know nothing of the subject but am a RAID person and would > like a solution that does not force me to copy the complete data load. > > Please lets get all the VM VFS and drivers people in one room and see > if we can have a Linux solution to this problem > > Boaz > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo(a)vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Jan Kara on 4 Jun 2010 12:30 On Thu 03-06-10 19:09:52, Boaz Harrosh wrote: > [Topic] > How to not let pages change while in IO > > [Abstract] > As seen in a long thread on the fsdvel scsi mailing lists. Lots of > people have headaches and sleep less nights because individual pages > can change while in IO and/or DMA. Though each one as slightly different > needs, the mechanics look to be the same. Hmm, I don't think it's really about "how to not let pages change" - that is doable by using wait_on_page_writeback() in ->page_mkwrite and ->write_begin. I think the discussion is more about whether we should do it or whether we should rechecksum and resubmit IO in case of checksum failure as Nick proposed... Honza > People that care: > - Mirror and RAID people that need on disk consistency. > - Network storage that wants data checksum. > - DIF/DIX people > - ... > > I for one know nothing of the subject but am a RAID person and would > like a solution that does not force me to copy the complete data load. > > Please lets get all the VM VFS and drivers people in one room and see > if we can have a Linux solution to this problem -- Jan Kara <jack(a)suse.cz> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Boaz Harrosh on 6 Jun 2010 05:40 On 06/04/2010 07:23 PM, Jan Kara wrote: > On Thu 03-06-10 19:09:52, Boaz Harrosh wrote: >> [Topic] >> How to not let pages change while in IO >> >> [Abstract] >> As seen in a long thread on the fsdvel scsi mailing lists. Lots of >> people have headaches and sleep less nights because individual pages >> can change while in IO and/or DMA. Though each one as slightly different >> needs, the mechanics look to be the same. > Hmm, I don't think it's really about "how to not let pages change" - that > is doable by using wait_on_page_writeback() in ->page_mkwrite and > ->write_begin. I think the discussion is more about whether we should do it > or whether we should rechecksum and resubmit IO in case of checksum failure > as Nick proposed... > > Honza I have hijacked the DIF threads but, No, my proposal is for a general toolset that could be used for all the above as well as DIF if needed. Surly even with DIF the keep-constant vs retransmit is a matter of machine+link speed multiply by faulting work loads. So there might be situations where an admin wants to choose. With other none checksum fixtures, like RAID5/MIRROR this is not always an option and it becomes keep-constant vs copy. (That is complete workload copy). So for these setups the option is clear. No? I'm glad that you think it is easy/doable to implement. And I'll surly test your above receipt. Do you think it would be acceptable as a generic per-sb tunable. So for instance an ext3 over RAID5 could turn this on and eliminate the data copy? Lets talk about this in LSF Boaz >> People that care: >> - Mirror and RAID people that need on disk consistency. >> - Network storage that wants data checksum. >> - DIF/DIX people >> - ... >> >> I for one know nothing of the subject but am a RAID person and would >> like a solution that does not force me to copy the complete data load. >> >> Please lets get all the VM VFS and drivers people in one room and see >> if we can have a Linux solution to this problem -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Jan Kara on 6 Jun 2010 19:40 On Sun 06-06-10 12:35:03, Boaz Harrosh wrote: > On 06/04/2010 07:23 PM, Jan Kara wrote: > > On Thu 03-06-10 19:09:52, Boaz Harrosh wrote: > >> [Topic] > >> How to not let pages change while in IO > >> > >> [Abstract] > >> As seen in a long thread on the fsdvel scsi mailing lists. Lots of > >> people have headaches and sleep less nights because individual pages > >> can change while in IO and/or DMA. Though each one as slightly different > >> needs, the mechanics look to be the same. > > > Hmm, I don't think it's really about "how to not let pages change" - that > > is doable by using wait_on_page_writeback() in ->page_mkwrite and > > ->write_begin. I think the discussion is more about whether we should do it > > or whether we should rechecksum and resubmit IO in case of checksum failure > > as Nick proposed... > > > > Honza > > I have hijacked the DIF threads but, No, my proposal is for a general > toolset that could be used for all the above as well as DIF if needed. > > Surly even with DIF the keep-constant vs retransmit is a matter of > machine+link speed multiply by faulting work loads. So there might be > situations where an admin wants to choose. > > With other none checksum fixtures, like RAID5/MIRROR this is not always > an option and it becomes keep-constant vs copy. (That is complete > workload copy). So for these setups the option is clear. No? Is it? You can have enough CPU / memory bandwidth to do the copying while you need not be comfortable with a thread blocking until IO is finished when it tries to do a rewrite... > I'm glad that you think it is easy/doable to implement. And I'll surly > test your above receipt. Do you think it would be acceptable as a generic > per-sb tunable. So for instance an ext3 over RAID5 could turn this on > and eliminate the data copy? Yes, that would be useful. At least so that one can get real performance numbers... Honza -- Jan Kara <jack(a)suse.cz> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Boaz Harrosh on 7 Jun 2010 04:40
On 06/07/2010 02:37 AM, Jan Kara wrote: >> With other none checksum fixtures, like RAID5/MIRROR this is not always >> > an option and it becomes keep-constant vs copy. (That is complete >> > workload copy). So for these setups the option is clear. No? > > Is it? You can have enough CPU / memory bandwidth to do the copying while > you need not be comfortable with a thread blocking until IO is finished > when it tries to do a rewrite... > >> I'm glad that you think it is easy/doable to implement. And I'll surly >> test your above receipt. Do you think it would be acceptable as a generic >> per-sb tunable. So for instance an ext3 over RAID5 could turn this on >> and eliminate the data copy? > > Yes, that would be useful. At least so that one can get real performance > numbers... > > Honza Thanks Jan. You have helped me tremendously. I think I can begin to understand now what I need to do. With the workloads I need (HPC), every cycle/memory counts and that the app waits for a rewrite is a good thing, which reminds me that I would want to trace that case so applications could be fixed, tuned. I do understand that for a desktop, that might be just the opposite, so testing is important. Perhaps I'll need help in instrumenting all this. Thanks Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |