-
-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not use keywords for headers #1
Comments
Hmm, this must be serious. But a property list with keyword keys is useful for high-level layer. How about just limiting the length of header fields? FYI, fast-http provides low-level API like |
Let's say we set length limit to 20, then if I'm correct the number of possible strings with 20 chars is at least 4.239115828×10²⁸ (27^20, we are ignoring case so at least 26 alphabet characters plus #'-) or at least 8.478231655×10²⁹ bytes. Then again if I'm correct we can continue adding with strings with 19 chars and so on. Taking this into account I don't think symbols here are actually usable, what we're using internally (since people really love keywords :-) is https://github.com/deadtrickster/ia-hash-table based on the idea stolen from RoR. As for low-level things, I glanced through code and it looks like you're using babel with default encoding (for me it's utf-8). I observed that trivial-utf-8 is slightly faster:
Maybe, UTF-8 is overkill for header fields, urls? |
20 chars limit is too strict. I choose 30 for the default and allow to change it by a special variable. |
(ql:quickload :fast-http)
(ql:quickload :cl-interpol)
(ql:quickload :trivial-utf-8)
(cl-interpol:enable-interpol-syntax)
(defun random-garbage (length)
(let ((chars "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-")
(garbage (make-string length)))
(dotimes (i length)
(setf (aref garbage i) (aref chars (random (length chars)))))
garbage))
(defun test-memory-leak (iterations)
(loop for i from 1 to iterations do
(let* ((request (list #?"POST / HTTP/1.1\r\n"
#?"Host: www.example.com\r\n"
#?"${(random-garbage 30)}: application/x-www-form-urlencoded\r\n"
#?"${(random-garbage 30)}: garbage\r\n"
#?"${(random-garbage 30)}: w-form-urlencoded\r\n"
#?"${(random-garbage 30)}: application/xurlencoded\r\n"
#?"${(random-garbage 30)}: applicatiurlencoded\r\n"
#?"${(random-garbage 30)}: application/x-wwwm-urlencoded\r\n"
#?"${(random-garbage 30)}: applicatwww-form-urlencoded\r\n"
#?"${(random-garbage 30)}: applicaform-urlencoded\r\n"
#?"${(random-garbage 30)}: applicationw-form-urlencoded\r\n"
#?"${(random-garbage 30)}: appltiorm-urlencoded\r\n"
#?"${(random-garbage 30)}: appliorm-urlencoded\r\n"
#?"${(random-garbage 30)}: appatiorm-lencoded\r\n"
#?"${(random-garbage 30)}: applitiorm-lencoded\r\n"
#?"${(random-garbage 30)}: applicatiorm-urlencoded\r\n"
#?"${(random-garbage 30)}: applicatiorm-\r\n"
#?"${(random-garbage 30)}: applitiorm-urlencoded\r\n"
#?"${(random-garbage 30)}: appliorm-urlencoded\r\n"
#?"${(random-garbage 30)}: applicrm-urlncoded\r\n"
#?"${(random-garbage 30)}: applicatiorm-\r\n"
#?"Content-Length: 4\r\n"
#?"Connection: close\r\n"
#?"\r\n"
#?"q=42\r\n"))
(http (fast-http:make-http-request :store-body t ))
(parser (fast-http:make-parser http :store-body t :header-callback (lambda (h) (declare (ignore h)) ))))
(loop for req-chunk in request do
(funcall parser (trivial-utf-8:string-to-utf-8-bytes req-chunk))))
(if (= 0 (mod i 10000))
(format t "~a Iterations~%" i))))
(define-alien-routine print-generation-stats void)
(sb-ext:gc :full t)
(print-generation-stats)
;; MAGIC
(test-memory-leak 20000000) ;; maybe you should adapt this constant to your memory settings
less than 190000 iterations! |
You're right. I think we should just stop using keywords in every other projects, Clack, Caveman2, ningle, clack-errors, and maybe hermetic? |
Summary
|
I'm fine with updating this in Wookie once moving to fast HTTP (thanks for the heads up) as well as making a general announcement about the change. Suggestion: can we lowercase the header keys in the hash table? Seems to work well for the Node folks, and I don't think I've ever seen a case where both "Content-Type" and "content-type" are together in the same request. |
* The all keys are downcased. * If there're multiple header line which has the same key, the values will be combined with ', ' (RFC 2616)
Okay, I changed the data type of HTTP headers to hash-table in the above commit. The all keys are downcased as orthecreedence said. If there're multiple header lines which have the same key, the values will be combined with ', '. (RFC 2616) |
I forgot to say, thank you @deadtrickster for your kind warning by showing clear examples. |
Fukamachi, thank you for fix, however :-)
Of course we can force users with lowercase strings, but maybe there something we can do about? (we can play with test and hash functions, test function obviously will be string-equal, and hash function can be probably stolen from say sbcl [ |
Interesting point. What do you think about adding a special variable or a keyword parameter to specify the case of header fields, like (make-parser http
:header-callback (lambda (headers) ...)
:body-callback (lambda (bytes) ...)
:header-fields-case :capitalize) Or making it possible to take a hash-table test function would be another option. I'm still thinking. |
How I'm voting for hash-table solution |
Are custom hash test functions portable (as in, will support most CL implementations)? I've had bad luck, but admittedly didn't dive very deep. If they are portable, then I'm in favor of using a case-insensitive hash test for headers, otherwise specifying |
Maybe custom-hash-table source can help? If it can't then |
on this topic, it could be worthwhile to look at the way cl-http manages headers. john mallory's concern about symbols as an attack vector was the motivation for cl-xml to allow for dynamically bound token tables an as alternative to keywords for generic identifiers. |
Out of curiosity, why were association lists not considered, or if they were, how is the overhead of a hash table justified? |
While issue not SBCL specific this code is:
Output
PS. funny thing - chunga and its friends do the same mistake.
The text was updated successfully, but these errors were encountered: