Prev: Proxy site open your favoraite sites facebook myspace ...etc
Next: And, following a series of proxy sites blocked open any page or download from RapidShare download sites have all accelerated the internet for free
From: Ersek, Laszlo on 30 Jun 2010 12:11 On Wed, 30 Jun 2010, arnuld wrote: > Problem is how can I be sure that particular length of data will arrive > in recv(). It can come in any of these partial recv()s > > 1st recv(): Content-L > 2nd recv(): ength: 1345 > > 1st recv(): Conte > 2nd recv(): -Length: 1234 > > 1st recv(): Content-Length: > 2nd recv(): 1234 > > 1st recv(): Content-Leng > 2nd recv(): th: 1234 Yes. You'll have a parser state which stands for "parsing the Content-Length header". This parser state (and the whole parser itself) can be implemented on various levels of sophistication. Probably one of the simplest is: enum state { ST_0, ST_1, ST_CONTENT_LENGTH, ST_READ_BODY, ... }; struct parser { char unsigned *recvbuf; size_t alloc, used; enum state state; size_t content_length; /* ... */ } and you have a function which adds data to "recvbuf" (managing "alloc" and "used" as well), and once added, tries to kick the "state" member as far as possible. (For more, see below.) After processing some data out of recvbuf, you may memmove() the rest to the beginning of "recvbuf". (This is terribly inefficient, but easier to implement and is enough for discussion, hopefully.) So you have a feed() function which takes the newly received bytes (in fact you'd probably read() directly into recvbuf, but let's ignore that for a moment), "appends" them to recvbuf, and then retries to handle the current state (ie. advance out of the current state, as far as possible, using the recently received bytes). This can be implemented by state-dependent function pointers or with switch statements, among others. The state handler should be level-triggered, like select(). For example, you should be able to call it consecutively, without adding any new bytes, with no harm.�(This is not a hard requirement at all, just a very basic idea for discussion). Anyway, the following could be a pseudo-implementation that handles the ST_CONTENT_LENGTH state: static int try_to_advance(struct parser *p) { switch (p->state) { /* ... */ case ST_CONTENT_LENGTH: { /* "Content-Length:" in ASCII */ static const char unsigned hdr_content_length[] = { 0x43u, 0x6fu, 0x6eu, 0x74u, 0x65u, 0x6eu, 0x74u, 0x2du, 0x4cu, 0x65u, 0x6eu, 0x67u, 0x74u, 0x68u, 0x3au }; const char unsigned *newline; /* newline terminating the header */ size_t advlen; /* size of parsed header, newline included */ if (sizeof hdr_content_length > p->used) { /* header too short, come back with more */ return 0; } if (0 != memcmp(hdr_content_length, p->recvbuf, sizeof hdr_content_length)) { /* header invalid, shed client */ return -1; } if (0 == (newline = memchr(p->recvbuf + sizeof hdr_content_length, 0x0Au, p->used - sizeof hdr_content_length))) { /* header not yet terminated, come back with more */ return 0; } /* We have a terminated line, try to parse the integer */ if (-1 == x_str_to_size_t(&p->content_length, p->recvbuf + sizeof hdr_content_length, newline)) { /* malformed header, terminate client's connection in outer loop */ return -1; } /* p->content_length is set up here. Remove the parsed part, advance p->state, and fall through to the next case label (ie. state). */ advlen = newline - p->recvbuf + 1u; p->used -= advlen; (void)memmove(p->recvbuf, newline + 1u, p->used); p->state = ST_READ_BODY; } case ST_READ_BODY: /* we can rely on p->content_length being filled in here */ /* ... */ /* ... */ } } Note that theoretically the header-parsing code has a worst-case behavior that is (at least) quadratic in time. This is because we restart memchr() from the same point after each recv(). We could save the offset where we gave up the last time and retry only from there. (More precisely, we could save the state *within* ST_CONTENT_LENGTH in more detail.) But in this form, if each recv() reads a single byte before we find the newline, we check 1 + 2 + 3 + 4 + ... bytes until we succeed. Going back to your examples above, > 1st recv(): Content-L > 2nd recv(): ength: 1345 > > 1st recv(): Conte > 2nd recv(): -Length: 1234 > > 1st recv(): Content-Length: > 2nd recv(): 1234 > > 1st recv(): Content-Leng > 2nd recv(): th: 1234 the "sizeof hdr_content_length > p->used" check will hold in all cases after the first recv(). Now suppose 1st recv(): Content-Length: 123 2nd recv(): 4\n then no newline will be found after the first recv(). x_str_to_size_t() does a number of things. I'd probably base it on strtol() (although that would require the platform to be ASCII-based, since our protocol is ASCII-based, and that would simplify the initialization of hdr_content_length too). x_str_to_size_t() must check whether the value can be parsed as a size_t (in fact, the allowed range would be [1 .. min { LONG_MAX, (size_t)-1 }] ) and that the parsed decimal string ends where we found the newline (to exclude "Content-Length: 1234XXXX\n"). Whitespace between the colon ":" and the beginning of the subject sequence of strtol() (ie. the decimal string) is swallowed by strtol(). Note that the above "protocol" is not HTTP at all. This stuff is very messy and it is very easy to introduce undefined behavior. (I probably did in the code above.) That's why I would suggest describing a protocol so that the parser's implementation can be generated. The generated parser should not try to read the data itself, but expect the programmer to feed it. lacos
From: Rainer Weikusat on 30 Jun 2010 12:25 Nicolas George <nicolas$george(a)salle-s.org> writes: > Rainer Weikusat wrote in message <87eifosg5m.fsf(a)fever.mssgmbh.com>: >> When data is read into a buffer of some maximum size and then parsed, >> anyway, your assertion that 'using \n as line terminator would be >> annoying' doesn't make any sense anymore, at least to me. Care to >> elaborate what those 'annoyances' are supposed to be? > > Knowing in advance the size of the data avoids all the dynamic > reallocation: There is no need to do 'dynamic reallocation' when parsing the contents of some input buffer provided the buffer size is larger than the record size. Usually, one would have a start pointer and a 'run' pointer an whenever the run pointer points to a \n (or any other kind of 'record terminating marker'), a record with a length of run - start starting at start has been found. [...] > >> IIRC, the last time I saw an actual character-based terminal was about >> a decade ago and it was already a rare curiousity at these times. Also > > So what? You claimed that ,---- | The original idea is that the terminals randomly added \r in what they | emitted and sometimes required them to display things properly. | | Fortunately, the days where network protocols were directly connected to a | terminal ended a good decade ago. `---- A decade ago, nobody was using character-based terminals anymore, and especially not 'connecting them to network protocols' whatever that is supposed to mean. In addition to this, >> the original SMTP RFC (822) specifically allowed both \r and \n as >> part of the user data (this has meanwhile been retracted) and >> consequently, the at least the SMTP line terminator must be something >> different from either of both, indepdently of what you were referring >> to above. > > And what is it supposed to prove? That SMTP needs to terminate lines with something other than \r or \n because the original SMTP RFC specifically allowed \r or \n as part of the data payload. Neither this RFC (nor any other I am aware of) contains anything regarding the need to work around broken 'terminals' of any kind and especially not 'broken terminals' which were in use only a decade ago, as you stated.
From: Ersek, Laszlo on 30 Jun 2010 12:48 On Wed, 30 Jun 2010, Ersek, Laszlo wrote: > On Wed, 30 Jun 2010, arnuld wrote: > >> 1st recv(): Content-L >> 2nd recv(): ength: 1345 >> >> 1st recv(): Conte >> 2nd recv(): -Length: 1234 >> >> 1st recv(): Content-Length: >> 2nd recv(): 1234 >> >> 1st recv(): Content-Leng >> 2nd recv(): th: 1234 > > the "sizeof hdr_content_length > p->used" check will hold in all cases after > the first recv(). Except in the third one, sorry. In that case, memchr() will search zero bytes, and then return with a null pointer. (No ASCII NL found.) I changed >= to > in the first check, but then failed to update this section completely. lacos
From: Nicolas George on 30 Jun 2010 12:54 Rainer Weikusat wrote in message <87aaqcsd19.fsf(a)fever.mssgmbh.com>: > There is no need to do 'dynamic reallocation' when parsing the contents > of some input buffer provided the buffer size is larger than the > record size. I do not know how you like to program, or even if you can program at all, but when I design a program, I like it to be able to deal with big inputs when necessary, but not allocates huge amounts of memory each time it reads a few dozens octets. Now, I do not know why I need to explain this: either you have already implemented anything remotely related to network protocols and what I wrote should be obvious, or you have not and I suggest you try some before annoying everyone here further. > A decade ago, nobody was using character-based terminals anymore, and > especially not 'connecting them to network protocols' whatever that is > supposed to mean. In addition to this, > That SMTP needs to terminate lines with something other than \r or \n > because the original SMTP RFC specifically allowed \r or \n as part of > the data payload. Neither this RFC (nor any other I am aware of) > contains anything regarding the need to work around broken 'terminals' > of any kind and especially not 'broken terminals' which were in use > only a decade ago, as you stated. You really do not want to understand anything ever, do you?
From: Rainer Weikusat on 30 Jun 2010 13:08
Nicolas George <nicolas$george(a)salle-s.org> writes: > Rainer Weikusat wrote in message <87aaqcsd19.fsf(a)fever.mssgmbh.com>: >> There is no need to do 'dynamic reallocation' when parsing the contents >> of some input buffer provided the buffer size is larger than the >> record size. > > I do not know how you like to program, or even if you can program at > all, but when I design a program, I like it to be able to deal with big inputs > when necessary, but not allocates huge amounts of memory each time it reads > a few dozens octets. Fine. Back to square one: Assuming you send an a priory unknown record size which has neither a practical nor a theoretical limit, you may need to do 'dynamic buffer reallocation' after having received the length and possible even while receiving the length, so this buys you exactly nothing. In the real world, sizes of 'records' used for network communication are usually bounded, so this issue doesn't exist. [...] >> A decade ago, nobody was using character-based terminals anymore, and >> especially not 'connecting them to network protocols' whatever that is >> supposed to mean. In addition to this, > >> That SMTP needs to terminate lines with something other than \r or \n >> because the original SMTP RFC specifically allowed \r or \n as part of >> the data payload. Neither this RFC (nor any other I am aware of) >> contains anything regarding the need to work around broken 'terminals' >> of any kind and especially not 'broken terminals' which were in use >> only a decade ago, as you stated. > > You really do not want to understand anything ever, do you? So far, you have posted a couple of assertions I have refuted two times and your only 'argument' has been 'being abusive'. I understand that you are probably just a jerk. Better? |