Prev: [PATCH] cgroup_freezer: Freezing and task move race fix
Next: [PATCH 2/9] perf ui: Move ui_helpline routines to separate file in util/ui/
From: Micha Nelissen on 10 Aug 2010 16:00 Hi all, Why is get_user_pages much slower than taking the faults? (I would expect it to be faster). Attached example program first mallocs a piece of memory (64MB in this case) then reads it "to take the faults". Afterwards, it uses mmap with MAP_POPULATE to "speed up" and not to have to take the faults, but have everything mapped in one go. I think mmap is using get_user_pages in this case. $ ./memspeed malloc took 0 msecs read took 14 msecs write took 0 msecs free took 1 msecs mmap took 45 msecs munmap took 5 msecs Using MAP_POPULATE is 3 times as slow as the 'stupid' implementation! I'm running a Core 2 duo e6300 system with linux 2.6.28.4. Am I doing something wrong? MAP_POPULATE seems a bit of a joke to me. Thanks, Micha
From: Kevin Easton on 12 Aug 2010 00:00 Quoting Micha Nelissen <micha(a)neli.hopto.org>: > Hi all, > > Why is get_user_pages much slower than taking the faults? (I would > expect it to be faster). > > Attached example program first mallocs a piece of memory (64MB in > this case) then reads it "to take the faults". Afterwards, it uses > mmap with MAP_POPULATE to "speed up" and not to have to take the > faults, but have everything mapped in one go. I think mmap is using > get_user_pages in this case. > > $ ./memspeed > malloc took 0 msecs > read took 14 msecs > write took 0 msecs > free took 1 msecs > mmap took 45 msecs > munmap took 5 msecs > > Using MAP_POPULATE is 3 times as slow as the 'stupid' > implementation! I'm running a Core 2 duo e6300 system with linux > 2.6.28.4. > > Am I doing something wrong? MAP_POPULATE seems a bit of a joke to me. Hi Micha, Yep, you are. Because your pointer 'p' is a pointer to int, when you increment it by 0x1000 in your loops you are actually incrementing it by 0x1000 * sizeof(int) - so you're only actually touching one page in four. If you change the types of 'buf', 'p' and 'e' to 'char *' then it touches every page - and (and least on my test box) the MAP_POPULATE case pulls ahead. - Kevin ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Micha Nelissen on 12 Aug 2010 06:10
Hi Kevin, Kevin Easton wrote: > Yep, you are. Because your pointer 'p' is a pointer to int, when you > increment it by 0x1000 in your loops you are actually incrementing it by > 0x1000 * sizeof(int) - so you're only actually touching one page in four. Oops sorry, thanks for catching my mistake. I also discovered the following: if I read from all pages, then call get_user_pages, it is still quite slow (did I get a read-only page?). However, if I touch all pages by writing to them, then get_user_pages becomes a factor 40 times faster or so. All is clear now, I think. Thanks. Micha -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |