Prev: Maintainer timeout on ports/147130 for security/keepassx
Next: Yet another segf with audio/sound-juicer
From: Lasse Collin on 19 Jun 2010 09:41 On 2010-06-19 Matthias Andree wrote: > We have system facilities for limiting resources, including those > that limit virtual memory. Limiting virtual memory with "ulimit -v" is generally not so great. It cripples mmap(), which can use a lot of virtual memory while using little actual RAM. If I made xz use mmap() for handling input files when possible, limiting virtual memory would have little to do with limiting the actual memory usage of xz: if xz mmapped a 280 MiB file that needs 65 MiB of memory to decompress, xz would run out memory if virtual memory was capped to 300 MiB. Luckily for you I don't plan to use mmap() in xz. :-) Perhaps FreeBSD provides a good working way to limit the amount of memory that a process actually can use. I don't see such a way e.g. in Linux, so having some method in the application to limit memory usage is definitely nice. It's even more useful in the compression library, because a virtual-memory-hog application on a busy server doesn't necessarily want to use tons of RAM for decompressing data from untrusted sources. > For compression, it's less critical because service is degraded, not > denied, but I'd still think -M max would be the better default. I can > always put "export XZ_OPT=-3" in /etc/profile.d/local.sh or wherever > it belongs on the OS of the day. If a script has "xz -9", it overrides XZ_OPT=-3. > I still think utilities and applications should /not/ impose > arbitrarily lower limits by default though. There's no multithreading in xz yet, but when there is, do you want xz to use as many threads as there are CPU cores _by default_? If so, do you mind if compressing with "xz -9" used around 3.5 GiB of memory on a four-core system no matter how much RAM it has? I think it is quite obvious that you want the number of threads to be limited so that xz won't accidentally exceed the total amount of physical RAM, because then it is much slower than using fewer threads. Being faster is the whole point of threading anyway. Naturally doing unusual things is sometimes wanted so a limit can be overriden. This is all about the default behavior only. In most cases, lowering the compression settings automatically is friendly towards the user. People easily write "xz -9" to scripts without thinking if they actually want that, because they are used to -9 with gzip and bzip2. I would find it dumb to annoy users of slightly older hardware with _default behavior_ that puts their system to swap whenever such a script is ran. They can still get the swap-till-the- morning behavior if they really want it by disabling the limit when compressing by using XZ_OPT. > > Disabling the limiter completely by default doesn't seem like an > > option, because it would only change who will be annoyed. Comments > > are very welcome. Thanks. > > It is a necessity to change it. In short, some people find a default limit annoying and some other people would find lack of default limit annoying. (And most people probably don't care.) So the question is, which group will complain more; obviously I cannot make everyone happy. At this point it starts to look like that your group is winning. ;-) I will have to discuss with people in the other group before making decisions. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode _______________________________________________ freebsd-ports(a)freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-ports To unsubscribe, send any mail to "freebsd-ports-unsubscribe(a)freebsd.org"
From: Matthias Andree on 20 Jun 2010 09:33 Am 19.06.2010 15:41, schrieb Lasse Collin: > Perhaps FreeBSD provides a good working way to limit the amount of > memory that a process actually can use. I don't see such a way e.g. in > Linux, so having some method in the application to limit memory usage is > definitely nice. It's even more useful in the compression library, > because a virtual-memory-hog application on a busy server doesn't > necessarily want to use tons of RAM for decompressing data from > untrusted sources. Even there the default should be "max", and the library SHOULD NOT second-guess what trust level of data the application might to process with libxz's help. Expose the limiter interface in the API if you want, but particularly for the library in particular, any other default than "unlimited memory" is a nuisance. And there's still an application, and unlike the xz library, the application should know what kind of data from what sources it is processing, and if - for instance - a virus inspector wants to impose memory limits and quarantine an attachment with what looks like an zip bomb. Typically, after the advent of KDE, GNOME, XFCE, and thereabouts with all their graphical tools, and people hardly use command line tools unless they know exactly what they're doing - and it's not as though xz's behaviour were prone to causing permanent damage somewhere, so it's OK if less skilled users find out the hard way that they need to read manpages. I was surprised because xz has somewhat left the traditional UNIX way, which was try exactly as little and as hard as you were told, until you bump into a brick wall (permission denied on some file, or memory allocation failed, or similar). Don't try to be nice unless you're asked to. Don't try to ask questions unless you're asked to be "interactive". All your defaults limit make using the utility unnecessarily hard, make it harder to explain, because the default limits are surprising and cause self-made failures. And I think many people would just lay xz or the library aside when figuring they need to do this and that and foo and bar and torture a black cat, swinging it over my head, and dance strange figures in the sewers in a full moon night, just so that xz or the library finally condescend to decompressing a file. I am exaggerating here, but please, don't make me jump through hoops with my application or script. I'd say a typical application wants to call xzopen() and decompress a file, and if it wants to impose limits it will use setrlimit() or perhaps xz_set_memory_limit in addition beforehand. I do fear that this will actually hamper, not foster, adoption of the xz software. >> For compression, it's less critical because service is degraded, not >> denied, but I'd still think -M max would be the better default. I can >> always put "export XZ_OPT=-3" in /etc/profile.d/local.sh or wherever >> it belongs on the OS of the day. > > If a script has "xz -9", it overrides XZ_OPT=-3. I know. This isn't a surprise for me. The memory limiting however is. And the memory limiting overrides xz -9 to something lesser, which may not be what I want either. >> I still think utilities and applications should /not/ impose >> arbitrarily lower limits by default though. > > There's no multithreading in xz yet, but when there is, do you want xz > to use as many threads as there are CPU cores _by default_? If so, do > you mind if compressing with "xz -9" used around 3.5 GiB of memory on a > four-core system no matter how much RAM it has? Multithreading in xz is worth discussion if the tasks can be parallelized, which is apparently not the case. You would be duplicating effort, because we have tools to run several xz on distinct files at the same time, for instance BSD portable make or GNU make with a "-j" option. > I think it is quite obvious that you want the number of threads to be > limited so that xz won't accidentally exceed the total amount of > physical RAM, because then it is much slower than using fewer threads. This tells me xz cannot fully parallelize its effort on the CPUs, and should be single-threaded so as not to waste the parallelization overhead. > Being faster is the whole point of threading anyway. Naturally doing > unusual things is sometimes wanted so a limit can be overriden. This is > all about the default behavior only. Yes, and I consider the default behaviour to be "getting in my way" and disturbing. No other compression tool I know would ever spend that much thought on its working environment. All others will only fail if it's "physically" impossible to complete the job, and otherwise just grind away. > In most cases, lowering the compression settings automatically is > friendly towards the user. People easily write "xz -9" to scripts > without thinking if they actually want that, because they are used to -9 > with gzip and bzip2. -9 is quite slow in bzip2, in gzip computers have become fast enough so that -9 hardly hurts today, but I recall times where I thought twice before using gzip -9 rather than gzip -3. People know there is a price tag attached, else the whole option system would be useless and --best were the only option. > I would find it dumb to annoy users of slightly > older hardware with _default behavior_ that puts their system to swap > whenever such a script is ran. They can still get the swap-till-the- > morning behavior if they really want it by disabling the limit when > compressing by using XZ_OPT. This is really xz developing a life of its own. Look: If I specify -9 or --best, but no memory option, that means "compress as hard as you can". Instead, xz assumes an implicit default memory limit, so the -9 gets degraded to -5, -2, -1, ... in an somewhat surprising manner, because depending on which computer I run it on, -9 might mean -9, or -6 on another computer, or -1 on a third. That is what I'd call a nasty surprise - xz overrides my command line option. I would propose that with -9 and without -M option, that it tries to allocate memory, and if it fails, it can still suggest to use the -M option or a lower -[0-8] option so I know how to proceed. > In short, some people find a default limit annoying and some other > people would find lack of default limit annoying. (And most people > probably don't care.) So the question is, which group will complain > more; obviously I cannot make everyone happy. At this point it starts to > look like that your group is winning. ;-) I will have to discuss with > people in the other group before making decisions. The real thing is that the xz software does things that go beyond the options. This has been a negative surprise to me. I also like to recall my earlier argument that xz is a low-level tool, and it's mixing high-level features in. This is astonishing. I think that some of the defaults you've set were trying to address usability concerns, and there are other ways to achieve this usability. Often it's an alternative proposal together with a diagnostic that suffices. Helping towards self-aid, in a way. I hope - egoistically - that xz will lean towards easier use in infrastructure (think build systems or applications using libraries), rather than assisting newbies. The more consistent the world around Unix-compatible and -environment tools is, the easier people will learn. It keeps the documentation simpler because there's not so many ifs and buts, and in the end I think it will pay off to change the default. Best regards Matthias _______________________________________________ freebsd-ports(a)freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-ports To unsubscribe, send any mail to "freebsd-ports-unsubscribe(a)freebsd.org"
From: Lasse Collin on 20 Jun 2010 17:04 On 2010-06-20 Ion-Mihai Tetcu wrote: > Personally I'd suggest keeping the option to limit the memory, but as > an option, not as default. OK. > One thing I would really love to see going away is the default to > delete the archive on decompression. Being somewhat compatible with gzip and bzip2 command line syntax is useful, so even though I don't disagree with you, the default is and will be to delete the input file. > Generally, I think programs should support both, the later overriding > the first: .conf -> env -> command line It means that I will need to create a config file on all my computers that have 512 MiB RAM or less to get the behavior I want. Probably other users with older computers have to do that too to avoid insanely slow compression and unresponsive system when some script runs "xz -9". While I would prefer no need for a config file, people like me seem to be in a minority, and creating a config file isn't that big deal. Using a second environment variable would be quite similar. Only the place where the setting is put would differ. A config file could allow more flexibility though, e.g. it could be possible to even override the preset levels with user-defined custom values (at his or her own risk, of course). > At the moment, what are the plans and the advantages of multithreding > (both on compression and decompression)? The "only" advantage is that threading makes things faster when there are multiple CPU cores to use. Disadvantages of threading: - Compression ratio might be worse. It depends on how the threading is done. Different ways have their own pros and cons. - Memory usage may be a lot be higher for both compression and decompression. The plan is to get some type of threaded compression support into liblzma after the 5.0.0 release. Considering my free time etc. I don't promise any kind of development schedule. The API will done so that applications won't need to think about the details of threading too much, and can use the zlib-style loop like they do in single-threaded mode. > > Next question could be how to determine how many threads could be > > OK for multithreaded decompression. It doesn't "fully" parallelize > > either, and would be possible only in certain situations. There > > too the memory usage grows quickly when threads are added. To me, > > a memory usage limit together with a limit on number of threads > > looks good; with no limits, the decompressor could end up reading > > the whole file into RAM (and swap). Threaded decompression isn't > > so important though, so I'm not even sure if I will ever implement > > it. > > I'd say offer an option if you want. Sorry, I explained this poorly. Simple number of threads = something is not good for threaded decompression. In a generic situation you don't know beforehand how much RAM each decompressor thread would use. If threaded decompression is implemented, maybe the default should be one thread just to keep things simple. But there should be an option to use optimal number of threads so that the user doesn't need to worry about details too much. My idea for that would be to have a user- specified maximum number of threads and a memory usage limit. Then xz would use up to the allowed number of threads as long as the memory usage limit is not exceeded. Without a memory usage limit, memory usage could grow to insane amounts if there are very many cores. It's somewhat similar for threaded compression, except that the amount of memory needed per thread at the given compression level is known before the compression is started. An option to easily tell xz to use optimal number of threads would be useful e.g. in scripts, which may be used on different computers, and thus don't want to be bothered to figure out how many CPU cores there are. I think a thread limit combined with memory usage limit is reasonable here too. For the above use, there should be default values for the thread and memory limits, so that a config file or many command line options wouldn't be strictly required to get some threading with the "use optimal number of threads" setting. Number of CPU cores and some percentage of RAM could work. Users could set better values themselves, but defaults are still a nice starting point and may be enough for many. Note that if I remove the current default memory usage limit from xz, the default memory usage limit used to calculate optimal number of threads wouldn't be used for anything else; if the limit is too low, xz would just drop to single-threaded mode to use minimal amount of RAM. > We've pondered a bit about switching our packages from .tbz to .xz or > tar.xz. Given that a package is made once, and downloaded and > decompressed by a lot of users a lot of times, it would probably make > sense to go for the smallest possible size; I had the same reasoning when I got interested in LZMA in 2004. LZMA was also much faster to decompress than bzip2. Slackware uses .txz suffix for .tar.xz packages, so if you prefer a single three-letter suffix instead of .tar.xz, .txz is the way to go. > however, if this would mean that some users won't be able to > decompress the packages, then probably xz isn't the tools for us. Decoder memory usage is all about the dictionary size. With 2 MiB dictionary you can make most packages smaller with xz than with "bzip2 -9" while keeping the decoder memory usage (3 MiB) _lower_ than that of bzip2 (man page says 3700k without using the slower --small mode). I would recommend using 8 MiB dictionary for packages. That way 9 MiB of memory is needed to decompress. That's what I used for packages years ago, and it's also the default in xz ("xz -6"). A dictionary bigger than 8 MiB is not useful unless the uncompressed file is over 8 MiB. Using "xz -6e" might reduce the size a little more with some files, but it's not necessarily worth the extra CPU time. Compressing with "xz -6" needs about 100 MiB memory. It is much more than with "bzip2 -9" (man page says 7600k), but should be fine on the systems that create the packages. Using "xz -9" for binary packages would be a bad choice. It doesn't save that much space over "xz -6" and can seriously annoy users of older computers. In contrast, decompressing files created with "xz -6" works nicely on 100 MHz Pentium with 32 MiB RAM (16 MiB should be quite OK too). I will need to emphasize much more in the xz docs and possibly also in "xz --help" that using -9 really isn't usually what people want. There are also additional filters that might help. Enabling them requires using advanced options. You can try e.g. "xz --x86 --lzma2" when compressing data that includes significant amount of x86-32 or x86-64 code. That filter has a known problem that makes it perform poorly on static libraries (and Linux kernel modules), so applying it to all packages isn't necessarily a good idea. In the future (I don't know when), there will be a better and easier-to-use filter, that will use heuristics to detect when and what extra filtering should be useful. > Speaking of sizes, do you have any statistical data regarding: source > size, compression options, compression speed and decompression speed > (and memory usage, since we're talking about it)? No. It's good to note here that I haven't so far worked much on the actual compression algorithms. The critical parts are directly derived from Igor Pavlov's LZMA SDK (the code may look very different at first sight, but don't let that mislead you). As I mentioned in an earlier email, I will tweak the compression settings mapped to the compression levels before the 5.0.0 release. To do that I will need to collect some data from many different compression settings. It probably won't be high quality data, since I have limited time for experiments and I just need some rough guidelines to tweak the options. Here are a few known things: - Decompression speed is roughly constant x bytes per second of _compressed_ data on the same machine. The better the compression has been, the faster the decompression tends to be. However, if the data doesn't fit to RAM and the system needs to swap out parts of the xz process, old floppy disks start to become competitive, because the memory is accessed quite randomly. - Dictionary keeps the most recently processed uncompressed data in a ring buffer. Using a dictionary bigger than the uncompressed file is useless. - Compressor memory usage is roughly 5-12 times the dictionary size. It depends on the match finder (see mf under --lzma2 on the man page). "xz -vv" shows the encoder memory usage. I might make single -v show that info in the future along with the decoder memory usage. - Decompressor memory usage is a little more than the dictionary size. The currently supported extra filters don't use significant amount of memory. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode _______________________________________________ freebsd-ports(a)freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-ports To unsubscribe, send any mail to "freebsd-ports-unsubscribe(a)freebsd.org"
First
|
Prev
|
Pages: 1 2 Prev: Maintainer timeout on ports/147130 for security/keepassx Next: Yet another segf with audio/sound-juicer |