You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
120 lines
4.7 KiB
120 lines
4.7 KiB
From e5a741f94977840c58775b38f8ed830207f7e4d0 Mon Sep 17 00:00:00 2001
|
|
From: Willy Tarreau <w@1wt.eu>
|
|
Date: Tue, 8 Aug 2023 16:17:22 +0200
|
|
Subject: [PATCH] BUG/MINOR: h1: do not accept '#' as part of the URI component
|
|
|
|
Seth Manesse and Paul Plasil reported that the "path" sample fetch
|
|
function incorrectly accepts '#' as part of the path component. This
|
|
can in some cases lead to misrouted requests for rules that would apply
|
|
on the suffix:
|
|
|
|
use_backend static if { path_end .png .jpg .gif .css .js }
|
|
|
|
Note that this behavior can be selectively configured using
|
|
"normalize-uri fragment-encode" and "normalize-uri fragment-strip".
|
|
|
|
The problem is that while the RFC says that this '#' must never be
|
|
emitted, as often it doesn't suggest how servers should handle it. A
|
|
diminishing number of servers still do accept it and trim it silently,
|
|
while others are rejecting it, as indicated in the conversation below
|
|
with other implementers:
|
|
|
|
https://lists.w3.org/Archives/Public/ietf-http-wg/2023JulSep/0070.html
|
|
|
|
Looking at logs from publicly exposed servers, such requests appear at
|
|
a rate of roughly 1 per million and only come from attacks or poorly
|
|
written web crawlers incorrectly following links found on various pages.
|
|
|
|
Thus it looks like the best solution to this problem is to simply reject
|
|
such ambiguous requests by default, and include this in the list of
|
|
controls that can be disabled using "option accept-invalid-http-request".
|
|
|
|
We're already rejecting URIs containing any control char anyway, so we
|
|
should also reject '#'.
|
|
|
|
In the H1 parser for the H1_MSG_RQURI state, there is an accelerated
|
|
parser for bytes 0x21..0x7e that has been tightened to 0x24..0x7e (it
|
|
should not impact perf since 0x21..0x23 are not supposed to appear in
|
|
a URI anyway). This way '#' falls through the fine-grained filter and
|
|
we can add the special case for it also conditionned by a check on the
|
|
proxy's option "accept-invalid-http-request", with no overhead for the
|
|
vast majority of valid URIs. Here this information is available through
|
|
h1m->err_pos that's set to -2 when the option is here (so we don't need
|
|
to change the API to expose the proxy). Example with a trivial GET
|
|
through netcat:
|
|
|
|
[08/Aug/2023:16:16:52.651] frontend layer1 (#2): invalid request
|
|
backend <NONE> (#-1), server <NONE> (#-1), event #0, src 127.0.0.1:50812
|
|
buffer starts at 0 (including 0 out), 16361 free,
|
|
len 23, wraps at 16336, error at position 7
|
|
H1 connection flags 0x00000000, H1 stream flags 0x00000810
|
|
H1 msg state MSG_RQURI(4), H1 msg flags 0x00001400
|
|
H1 chunk len 0 bytes, H1 body len 0 bytes :
|
|
|
|
00000 GET /aa#bb HTTP/1.0\r\n
|
|
00021 \r\n
|
|
|
|
This should be progressively backported to all stable versions along with
|
|
the following patch:
|
|
|
|
REGTESTS: http-rules: add accept-invalid-http-request for normalize-uri tests
|
|
|
|
Similar fixes for h2 and h3 will come in followup patches.
|
|
|
|
Thanks to Seth Manesse and Paul Plasil for reporting this problem with
|
|
detailed explanations.
|
|
|
|
(cherry picked from commit 2eab6d354322932cfec2ed54de261e4347eca9a6)
|
|
Signed-off-by: Amaury Denoyelle <adenoyelle@haproxy.com>
|
|
(cherry picked from commit 9bf75c8e22a8f2537f27c557854a8803087046d0)
|
|
Signed-off-by: Amaury Denoyelle <adenoyelle@haproxy.com>
|
|
(cherry picked from commit 9facd01c9ac85fe9bcb331594b80fa08e7406552)
|
|
Signed-off-by: Amaury Denoyelle <adenoyelle@haproxy.com>
|
|
(cherry picked from commit 832b672eee54866c7a42a1d46078cc9ae0d544d9)
|
|
Signed-off-by: Willy Tarreau <w@1wt.eu>
|
|
---
|
|
src/h1.c | 15 +++++++++++----
|
|
1 file changed, 11 insertions(+), 4 deletions(-)
|
|
|
|
diff --git a/src/h1.c b/src/h1.c
|
|
index eeda311b7..91d3dc47a 100644
|
|
--- a/src/h1.c
|
|
+++ b/src/h1.c
|
|
@@ -480,13 +480,13 @@ int h1_headers_to_hdr_list(char *start, const char *stop,
|
|
case H1_MSG_RQURI:
|
|
http_msg_rquri:
|
|
#ifdef HA_UNALIGNED_LE
|
|
- /* speedup: skip bytes not between 0x21 and 0x7e inclusive */
|
|
+ /* speedup: skip bytes not between 0x24 and 0x7e inclusive */
|
|
while (ptr <= end - sizeof(int)) {
|
|
- int x = *(int *)ptr - 0x21212121;
|
|
+ int x = *(int *)ptr - 0x24242424;
|
|
if (x & 0x80808080)
|
|
break;
|
|
|
|
- x -= 0x5e5e5e5e;
|
|
+ x -= 0x5b5b5b5b;
|
|
if (!(x & 0x80808080))
|
|
break;
|
|
|
|
@@ -498,8 +498,15 @@ int h1_headers_to_hdr_list(char *start, const char *stop,
|
|
goto http_msg_ood;
|
|
}
|
|
http_msg_rquri2:
|
|
- if (likely((unsigned char)(*ptr - 33) <= 93)) /* 33 to 126 included */
|
|
+ if (likely((unsigned char)(*ptr - 33) <= 93)) { /* 33 to 126 included */
|
|
+ if (*ptr == '#') {
|
|
+ if (h1m->err_pos < -1) /* PR_O2_REQBUG_OK not set */
|
|
+ goto invalid_char;
|
|
+ if (h1m->err_pos == -1) /* PR_O2_REQBUG_OK set: just log */
|
|
+ h1m->err_pos = ptr - start + skip;
|
|
+ }
|
|
EAT_AND_JUMP_OR_RETURN(ptr, end, http_msg_rquri2, http_msg_ood, state, H1_MSG_RQURI);
|
|
+ }
|
|
|
|
if (likely(HTTP_IS_SPHT(*ptr))) {
|
|
sl.rq.u.len = ptr - sl.rq.u.ptr;
|
|
--
|
|
2.43.0
|
|
|