| Date: January 5, 2006 |
| Author: Daniel Stenberg |
| |
| Status of project Hiper - high performance libcurl modifications |
| ================================================================ |
| |
| What is Hiper |
| |
| You won't find such a description in this document. See |
| http://curl.haxx.se/libcurl/hiper/ for further details. |
| |
| Live Progress Info |
| |
| During my work, I've posted occational updates on the curl-library mailing |
| list but more importantly done frequent updates of |
| http://curl.haxx.se/libcurl/hiper/schedule.html |
| |
| Schedule |
| |
| I took time off my regular job during Decemember 2005 and the first week of |
| January 2006 to work on hiper full-time. |
| |
| Step 1 - Measure the Existing Solution |
| |
| I started full-time work on project Hiper on December 1st 2005. I began by |
| putting together a test application that used the existing API to allow me |
| to properly and with accuracy measure execution and transfer speeds when |
| doing a large amount of transfers. |
| |
| I soon discovered that it was impossible to do any sensible measurements by |
| using live and actual URLs since the transfers were too unrelialble and |
| uncontrolled. I then enhanced the current HTTP server in the curl test suite |
| and made that support a large amount of transfers and some extra magic |
| "commands" that would make the server either just sit "idle" or "stream" |
| (continuously sending data in a never-ending stream). I then wrote up two |
| files using the curl test suite file format and by acessing the properly |
| formatted URLs on my localhost the HTTP server would either run "idle" or |
| run "stream". |
| |
| Having this working, I patched libcurl to always only recv() a single byte |
| off the network each time, just to make sure that the time spent on reading |
| data is constant and never very long. |
| |
| I adjusted the test application (actually called 'hiper') to create Y idle |
| transfers and Z stream transfers, had it run for N seconds and then quit and |
| produce a summary on stdout. Now I got very solid and repeatable results. I |
| started to run repeated tests and save the results when I ran into the |
| dreaded 1024 socket maximum limit. |
| |
| One side of the problem is that the fd_set type only allows 1024 file |
| descriptors (on my Linux), which I had to solve by simply making my own type |
| with room for more connections and do ugly typecasts in the code. The other |
| side of the problem is that user applications have a limit imposed by the |
| system on the maximum amount of file descriptors it can have open and I had |
| to work around that by writing a special tool that runs setuid root that |
| increases the limit, downgrades to a normal user again and then run the |
| command line of your choice. This second approach has to be used for both |
| 'hiper' and the test HTTP server. (You need to build the HTTP server with |
| CURL_SWS_FORK_ENABLED defined to have it do forks since it isn't desirable |
| to do so when running the normal curl tests.) |
| |
| Now I could run my test program without problems. I decided to run the tests |
| with 1 stream connection and a varying amount of idle ones. I did 1001, |
| 2001, 3001, 5001 and 9001 connections and measured how long select() and |
| curl_multi_perform() (including the curl_multi_fdset() call) would take in |
| average, over a period of 20 seconds. I ran each test 5-6 times and I used |
| the average time of all the runs. |
| |
| The times in number of microseconds: |
| |
| Connections multi_perform select |
| 1001 3504 951 |
| 2001 7606 1988 |
| 3001 11045 2715 |
| 5001 16406 4024 |
| 9001 32147 8030 |
| |
| Test system |
| CPU: Athlon XP 2800 |
| RAM: 1 GB |
| Linux: 2.6 |
| glibc: 2.3.5 |
| libcurl: 7.15.1 |
| |
| The only reason I stopped at 9001 connections is that my test machine ran |
| out of avaiable memory by then as I ran the test server on the same machine, |
| and I didn't want to risk the test result accuracy by having it start using |
| the swap during the tests. |
| |
| It means that at 9000 connections we spend 40ms for each socket action, even |
| when only one socket ever have action. |
| |
| With these 32000 microseconds curl_multi_perform() takes for 9000 |
| connections, it loops 18000 laps which makes less than 2 microseconds per |
| lap. (Of course counting time/laps is an oversimplification, but anyway.) |
| Hopefully we should achieve less than 10 microseconds for each call to |
| curl_multi_socket() for an active connection. |
| |
| The timing graph displayed on the libevent site (duplicated on the hiper |
| project page) suggests that libevent is pretty much fixed at 50 microseconds |
| (although I don't know what test box was used in their testing, we can |
| compare the select()-times from my tests and see that they are at least |
| resonably close). |
| |
| Summing up, the current ~40 ms spent at 9000 connections could then possibly |
| be lowered to something around 60 us! |
| |
| Step 2 - Implement curl_multi_socket API |
| |
| Most of the design decisions and debates about this new API have already |
| been held on the curl-library mailing list a long time ago so I had a basic |
| idea on what approach to use. The main ideas of the new API are simply: |
| |
| 1 - The application can use whatever event system it likes as it gets info |
| from libcurl about what file descriptors libcurl waits for what action |
| on. (The previous API returns fd_sets which is very select()-centric). |
| |
| 2 - When the application discovers action on a single socket, it calls |
| libcurl and informs that there was action on this particular socket and |
| libcurl can then act on that socket/transfer only and not care about |
| any other transfers. (The previous API always had to scan through all |
| the existing transfers.) |
| |
| The idea is that curl_multi_socket() calls a given callback with information |
| about what socket to wait for what action on, and the callback only gets |
| called if the status of that socket has changed. |
| |
| In the API draft from before, we have a timeout argument on a per socket |
| basis and we also allowed curl_multi_socket() to pass in an 'easy handle' |
| instead of socket to allow libcurl to shortcut a lookup and work on the |
| affected easy handle right away. Both these turned out to be bad ideas. |
| |
| The timeout argument was removed from the socket callback since after much |
| thinking I came to the conclusion that we really don't want to handle |
| timeouts on a per socket basis. We need it on a per transfer (easy handle) |
| basis and thus we can't provide it in the callbacks in a nice way. Instead, |
| we have to offer a curl_multi_timeout() that returns the largest amount of |
| time we should wait before we call the "timeout action" of libcurl, to |
| trigger the proper internal timeout action on the affected transfer. To get |
| this to work, I added a struct to each easy handle in which we store an |
| "expire time" (if any). The structs are then "splay sorted" so that we can |
| add and remove times from the linked list and yet somewhat swiftly figure |
| out 1 - how long time there is until the next timer expires and 2 - which |
| timer (handle) should we take care of now. Of course, the upside of all this |
| is that we get a curl_multi_timeout() that should also work with old-style |
| applications that use curl_multi_perform(). |
| |
| The easy handle argument was removed fom the curl_multi_socket() function |
| because having it there would require the application to do a socket to easy |
| handle conversion on its own. I find it very unlikely that applications |
| would want to do that and since libcurl would need such a lookup on its own |
| anyway since we didn't want to force applications to do that translation |
| code (it would be optional), it seemed like an unnecessary option. I also |
| realized that when we use underlying libraries such as c-ares (for DNS |
| asynch resolving) there might in fact be more than one transfer waiting for |
| action on the same socket and thus it makes the lookup even tricker and even |
| less likely to ever get done by applications. Instead I created an internal |
| "socket to easy handles" hash table that given a socket (file descriptor) |
| returns a list of easy handles that waits for some action on that socket. |
| |
| To make libcurl be able to report plain sockets in the socket callback, I |
| had to re-organize the internals of the curl_multi_fdset() etc so that the |
| conversion from sockets to fd_sets for that function is only done in the |
| last step before the data is returned. I also had to extend c-ares to get a |
| function that can return plain sockets, as that library too returned only |
| fd_sets and that is no longer good enough. The changes done to c-ares have |
| been committed and are available in the c-ares CVS repository destined to be |
| included in the upcoming c-ares 1.3.1 release. |
| |
| The 'shiper' tool is the test application I wrote that uses the new |
| curl_multi_socket() in its current state. It seems to be working and it uses |
| the API as it is documented and supposed to work. It is still using |
| select(), because I needed that during development (like until I had the |
| socket hash implemented etc) and because I haven't yet learned how to use |
| libevent or similar. |
| |
| The hiper/shiper tools are very simple and initiates lots of connections and |
| have them running for the test period and then kills them all. |
| |
| Since I wasn't done with the implementation until early January I haven't |
| had time to run very many measurements and checks, but I have done a few |
| runs with up to a few hundred connections (with a single active one). The |
| curl_multi_socket() invoke then takes 3-6 microseconds in average (using the |
| read-only-1-byte-at-a-time hack). If this number does increase a lot when we |
| add connections, it certainly matches my in my opinion very ambitious goal. |
| We are now below the 60 microseconds "per socket action" goal. It is |
| destined to be somewhat higher the more connections we have since the hash |
| table gets more populated and the splay tree will grow etc. |
| |
| Some tests at 7000 and 9000 connections showed that the socket hash lookup |
| is somewhat of a bottle neck. Its current implementation may be a bit too |
| limiting. It simply has a fixed-size array, and on each entry in the array |
| it has a linked list with entries. So the hash only checks which list to |
| scan through. The code I had used so for used a list with merely 7 slots (as |
| that is what the DNS hash uses) but with 7000 connections that would make an |
| average of 1000 nodes in each list to run through. I upped that to 97 slots |
| (I believe a prime is suitable) and noticed a significant speed increase. I |
| need to reconsider the hash implementation or use a rather large default |
| value like this. At 9000 connections I was still below 10us per call. |
| |
| Status Right Now |
| |
| The curl_multi_socket() API is implemented according to how it is |
| documented. The man pages for curl_multi_socket and curl_multi_timeout are |
| both committed to CVS and are available online for easy browsing: |
| |
| http://curl.haxx.se/libcurl/c/curl_multi_socket.html |
| http://curl.haxx.se/libcurl/c/curl_multi_timeout.html |
| |
| The hiper-5.patch I made available early morning January 5th, 2006 should |
| apply fine on a recent CVS checkout (at the time of this writing curl 7.15.1 |
| is the latest public curl release but the hiper patch does not apply fine on |
| that). |
| |
| What is Left for the curl_multi_socket API |
| |
| 1 - More measuring with more extreme number of connections |
| |
| 2 - More testing with actual URLs and complete from start to end transfers. |
| |
| I'm quite sure we don't set expire times all over in the code properly, so |
| there is bound to be some timeout bugs left. |
| |
| What it really takes is for me to commit the code and to make an official |
| release with it so that we get people "out there" to help out testing it. |
| |
| What is Left for project Hiper |
| |
| 1 - Add HTTP pipelining support |
| |
| 2 - Add a zero (or at least close to zero) copy interface |
| |
| Neither of these points have been planned or detailed exactly how they will |
| be implemented. |
| |
| Roadmap Ahead |
| |
| I plan and hope to return to full-time hiper work later on this spring or |
| possibly summer to continue where I pause now. Of course some spare time |
| might also be spent until then to get us moving forward. |
| |
| --------------------------------------------------------------------------- |
| |
| April 11, 2006 |
| |
| While sitting staring on my screen trying to write up a *nice* sample script |
| using libevent, it strikes me that since libevent is pretty much based around |
| its structs that you setup for each event/file descriptor, my application |
| wants to figure out the correct struct that is associted with the file |
| descriptor that libcurl provides in the socket callback. |
| |
| This feels like an operation most applications will need when using the |
| multi_socket API, so it feels like I should better try to figure out a decent |
| way to offer this basic functionality already in libcurl - and the fact that |
| we already have the file descriptors in a hash we can probably just as well |
| extend it somewhat and store some custom pointers as well. |
| |
| We need to offer the app a way to set a private pointer to be associated with |
| the particular file descriptor, and then be able to provide that pointer on |
| subsequent callback calls. |
| |
| --------------------------------------------------------------------------- |
| |
| April 20, 2006 |
| |
| I was wrong when I previously claimed we could have more than one easy handle |
| using the same socket. I've cleaned up and simplified code now to adjust to |
| this. |
| |
| --------------------------------------------------------------------------- |
| |
| July 9, 2006 |
| |
| TODO: We need to alter how we use c-ares for getting info about its sockets, |
| as c-ares now provides a callback approach very similar to how libcurl is |
| about to work. |
| |
| I'm adding a function called curl_multi_assign() that will set a private |
| pointer added to the internal libcurl hash table for the particular socket |
| passed in to this function: |
| |
| CURLMcode curl_multi_assign(CURLM *multi_handle, |
| curl_socket_t sockfd, |
| void *sockp); |
| |
| 'sockp' being a custom pointer set by the application to be associated with |
| this socket. The socket has to be already existing and in-use by libcurl, |
| like having already called the callback telling about its existance. |
| |
| The set hashp pointer will then be passed on to the callback in upcoming |
| calls when this same socket is used (in the brand new 'socketp' argument). |
| |
| --------------------------------------------------------------------------- |
| |
| July 30, 2006 |
| |
| Shockingly stupid (of me not having realized this before), but we really need |
| to add a 'running_handles' argument to the curl_multi_socket() and |
| curl_multi_socket_all() prototypes so that the caller can get to know when |
| all the transfers are actually done! |