From: Jeff Koftinoff on 18 Sep 2006 15:30 Wu Yongwei wrote: > > Why do you say `no exception thrown'? I would expect a std::bad_alloc, > and, when it is not caught, an abort(). > You would expect correct based on a conforming system. However, any system that employs lazy memory allocation {allocation of memory pages at page fault time instead of malloc() time} , behaves differently. I have written a simple test program available at: https://clicker.jdkoftinoff.com/projects/trac/jdks/wiki/alloctests Which does some stress tests that shows the problems. > > > > Depending where this code is used, it can be the basis of a security > > hole - and in that sense fgets() could be a better solution! > > Yes, this is a problem, but I do not see it a security hole. Every > program can run out of memory for some kind of input, including Firefox > and Internet Explorer. I often see Firefox occupies more than 500 MB of > memory, which makes me feel necessary to restart it after viewing the > big page. I am sure it is possible to make a big page to crash them. > It is a security hole when it joins the the lazy allocation problem. It allows an untrusted user to kill servers, potentially even other unrelated admin servers running on the same system. Yes, it is a problem with those system's designs. But it is a real one that affects many real servers. It can not be stressed more that catching std::bad_alloc is not always enough! > When this problem could be a real issue, there are ways to go around > it. For example, using custom allocators. The point is that C++ does > not strangely force an arbitrary limit on how long a line could be. And > the system limitation could be put somewhere else than the processing > logic. > Unfortunately, the interface of getline() and std::string also do not allow me to put a limit on how long the line could be. And if I can use this space to respond to James Kanze's comment: James Kanze wrote: > I agree that a version of the function with a maximum length > would be nice. Or simply specifying that extractors respect > setw too---that would be useful for a lot of other things as > well. But I don't see it really as a security hole, any more > than the possibility of any user function to allocate more > available memory than it should is. If untrusted users have > access to your program, you'll have taken the necessary steps > outside the program to prevent DOS due to thrashing. What would the necessary steps be then to ensure maximum line length in my protocol parses utilizing iostream and std::string? Should I write my own getline()? Or read one character at a time? I, for one, would love to have a std::string() class that had an option of setting a maximum allowable length. Regards, Jeff Koftinoff www.jdkoftinoff.com [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: kanze on 19 Sep 2006 13:16 Jeff Koftinoff wrote: > Wu Yongwei wrote: > > Why do you say `no exception thrown'? I would expect a > > std::bad_alloc, and, when it is not caught, an abort(). > You would expect correct based on a conforming system. > However, any system that employs lazy memory allocation > {allocation of memory pages at page fault time instead of > malloc() time} , behaves differently. Yes, but you really shouldn't allow such machines to be connected to the Internet. It's the overcommit which is the problem, not the code using getline(). Normally, any Linux machine connected to the network should have the value 2 in /proc/sys/vm/overcommit_memory. (I would, in fact, recommend this on any Linux system, unless some of the applications being run require overcommit or were designed with overcommit in mind. The process which gets killed isn't necessarily the one using too much memory; I have heard of a least one case where a critical system process needed to login was killed.) Note that not having overcommit isn't a panacea either. I can remember Solaris 2.2 hanging for around 5 minutes, thrashing like crazy but not advancing in any visible way, in one of the stress tests I did on it. (This seems to have been fixed in later versions. At least, my stress test caused no problems with Solaris 2.4.) [...] > It is a security hole when it joins the the lazy allocation > problem. Overcommit is a very serious security hole. Which has nothing to do with getline or fgets (unless you are considering the possibility of writing the entire program without using any dynamic memory). > It allows an untrusted user to kill servers, potentially even > other unrelated admin servers running on the same system. The untrusted user doesn't get a choice with regards to which processes are killed. For that matter, nor does the trusted user:-). Anytime you run a system with overcommit, any program which uses dynamic memory, OR forks, OR does any one of a number of other things which could cause memory to be allocated, may crash the system anytime it runs. Common sense says that you don't run anything important on such machines. > Yes, it is a problem with those system's designs. But it is a > real one that affects many real servers. It can not be > stressed more that catching std::bad_alloc is not always > enough! IIRC, Andy Koenig wrote an article about the general problem a long time ago, in some OO journal. In general: except on specially designed systems, you can't count on catching bad_alloc and recovering; there are generally cases of memory allocation failure that escape its detection (the stack is an obvious example). On the other hand, many programs don't require 100% certainty of catching it, and on a well designed system, if you exceed available memory and don't manage to catch it, the system will still shut you down cleanly and free up most resources. Also, if you're familiar with the system, you may be able to avoid some of the problem areas---I've written programs for Solaris where I could guarantee no insufficient memory due to stack overflow after start-up. > > When this problem could be a real issue, there are ways to > > go around it. For example, using custom allocators. The > > point is that C++ does not strangely force an arbitrary > > limit on how long a line could be. And the system limitation > > could be put somewhere else than the processing logic. > Unfortunately, the interface of getline() and std::string also > do not allow me to put a limit on how long the line could be. > And if I can use this space to respond to James Kanze's comment: > James Kanze wrote: > > > I agree that a version of the function with a maximum length > > would be nice. Or simply specifying that extractors respect > > setw too---that would be useful for a lot of other things as > > well. But I don't see it really as a security hole, any more > > than the possibility of any user function to allocate more > > available memory than it should is. If untrusted users have > > access to your program, you'll have taken the necessary steps > > outside the program to prevent DOS due to thrashing. > What would the necessary steps be then to ensure maximum line > length in my protocol parses utilizing iostream and > std::string? One obvious solution would be to overload getline with an additional parameter specifying maximum length. A perhaps more general solution would be to systematically recognize the width parameter on input---this would also be useful for reading files with fixed width fields, rather than separators. (Note that depending on the implementation, inputting with >> into an int might suffer similar problems, if you feed it too many digits. (I would expect any good implementation to stop storing digits once it recognizes that it has more than enough, but all the standard says is that if you feed it a number which is too big, you have undefined behavior.) > Should I write my own getline()? Or read one character at a > time? It depends on what you are doing. I'd start by getting rid of overcommit:-). AIX stopped using by default a long time ago (but you can still turn it on, on a user by user basis, if you need it). Most Linux distributions seem to default to using it (which seems fairly irresponsible), but it's easy to turn off (globally). Solaris and HP/UX don't use it. So you safe on the major Unix platforms (True Unix for Alpha? SGI? I don't know.) After that, it depends on the application, and what you're using the standard streams for. My applications are mainly large, reliable servers, and istream is used only for reading the configuration file---if it crashes, it crashes during initialization, due to an operator error (bad config file), but the server doesn't go down once it is running. Unless it hits some odd case in the system or the system library that I couldn't protect against. I also write a lot of little quicky programs for my own use. There too, if they crash because of an excessively long line, it's no big deal. (But if I were serious about it, I'd set up a new_handler to emit a nice message before terminating the program.) > I, for one, would love to have a std::string() class that had > an option of setting a maximum allowable length. That's another issue. In many applications, it would, in fact, be useful t
From: Hendrik Schober on 19 Sep 2006 13:20 crhras <crhras(a)sbcglobal.net> wrote: > > Wow ! I just used two different file IO methods and the performance > > difference was huge. > > I should have mentioned that I'm using Borland Studio 2006 on Windows XP > Pro. At this point, I think it might be caused by a flaw in the way > getline( ) is implemented by Borland. I am going to post this question > at borland.public.cppbuilder.language.cpp and if I discover anything > there I'll post it here. AFAIK Borland Studio now uses Dinkumware for a std lib. Pete Becker, who already answered in this thread, might have written this very function, so I doubt you will get better answers in b.p.cb.l.cpp. It has been pointed out to you several times that 'std::getline()' does significantly more than 'fgets()' and that the latter is more comparable to 'std::istream::getline()'. > Thanks again for everyone's responses. Schobi -- SpamTrap(a)gmx.de is never read I'm Schobi at suespammers dot org "The sarcasm is mightier than the sword." Eric Jarvis [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: crhras on 20 Sep 2006 11:09 > will get better answers in b.p.cb.l.cpp. It has been > pointed out to you several times that 'std::getline()' > does significantly more than 'fgets()' and that the > latter is more comparable to 'std::istream::getline()'. > You are correct. I didn't get better responses in b.p.cb.l.cpp. And I understand that getline( ) does "significantly" more than fgets( ). But does it do significantly 4000 percent more ? Remember, my tests showed roughly 5 seconds for fgets( ) versus over 200 for getline( ). [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: kanze on 27 Sep 2006 09:04
Earl Purple wrote: > Hendrik Schober wrote: > > That depends on 'std::getline()'s implementation, > > the compilers ability to inline/optimize, the memory > > manager used and probably a lot more. > In the example above he is re-using the same std::string. I > would hope that getline called multiple times would attempt to > use the string's already-allocated buffer if it has one, and > therefore reallocations would only happen when you're reading > a line that is longer than any you have encountered > previously. If he initially reserves 512 in the string, > assuming none of the lines are longer than that, then no > reallocations would be necessary at all. There is no way to access the string's already-allocated buffer from outside the string, so this would require getline to be a friend of std::string. I rather doubt that it is in most implementations. (A quick grep on the preprocessor output of a program which included <string> showed no friend in the g++ implementation.) > Besides that, I would assume that both methods would write to > a buffer before writing directly to the string / char-array. I presume you are talking about the buffering done in filebuf. > I would hope that it reads ahead to find the terminating '\n' > character before checking the allocation length of the > std::string. I'm not sure how you propose to implement this. getline doesn't have access to the internals of filebuf, any more than it does to the internals of std::string. And the buffer in filebuf has a fixed length anyway, so there's no guarantee that you'd find the '\n' in it. -- James Kanze GABI Software Conseils en informatique orient?e objet/ Beratung in objektorientierter Datenverarbeitung 9 place S?mard, 78210 St.-Cyr-l'?cole, France, +33 (0)1 30 23 00 34 -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ] |