hiper/STATUS - third_party/curl - Git at Google

 Date:   January 5, 2006
 Author: Daniel Stenberg

        Status of project Hiper - high performance libcurl modifications
        ================================================================

 What is Hiper

   You won't find such a description in this document. See
   http://curl.haxx.se/libcurl/hiper/ for further details.

 Live Progress Info

   During my work, I've posted occational updates on the curl-library mailing
   list but more importantly done frequent updates of
   http://curl.haxx.se/libcurl/hiper/schedule.html

 Schedule

   I took time off my regular job during Decemember 2005 and the first week of
   January 2006 to work on hiper full-time.

 Step 1 - Measure the Existing Solution

   I started full-time work on project Hiper on December 1st 2005. I began by
   putting together a test application that used the existing API to allow me
   to properly and with accuracy measure execution and transfer speeds when
   doing a large amount of transfers.

   I soon discovered that it was impossible to do any sensible measurements by
   using live and actual URLs since the transfers were too unrelialble and
   uncontrolled. I then enhanced the current HTTP server in the curl test suite
   and made that support a large amount of transfers and some extra magic
   "commands" that would make the server either just sit "idle" or "stream"
   (continuously sending data in a never-ending stream). I then wrote up two
   files using the curl test suite file format and by acessing the properly
   formatted URLs on my localhost the HTTP server would either run "idle" or
   run "stream".

   Having this working, I patched libcurl to always only recv() a single byte
   off the network each time, just to make sure that the time spent on reading
   data is constant and never very long.

   I adjusted the test application (actually called 'hiper') to create Y idle
   transfers and Z stream transfers, had it run for N seconds and then quit and
   produce a summary on stdout. Now I got very solid and repeatable results. I
   started to run repeated tests and save the results when I ran into the
   dreaded 1024 socket maximum limit.

   One side of the problem is that the fd_set type only allows 1024 file
   descriptors (on my Linux), which I had to solve by simply making my own type
   with room for more connections and do ugly typecasts in the code. The other
   side of the problem is that user applications have a limit imposed by the
   system on the maximum amount of file descriptors it can have open and I had
   to work around that by writing a special tool that runs setuid root that
   increases the limit, downgrades to a normal user again and then run the
   command line of your choice. This second approach has to be used for both
   'hiper' and the test HTTP server. (You need to build the HTTP server with
   CURL_SWS_FORK_ENABLED defined to have it do forks since it isn't desirable
   to do so when running the normal curl tests.)

   Now I could run my test program without problems. I decided to run the tests
   with 1 stream connection and a varying amount of idle ones. I did 1001,
   2001, 3001, 5001 and 9001 connections and measured how long select() and
   curl_multi_perform() (including the curl_multi_fdset() call) would take in
   average, over a period of 20 seconds. I ran each test 5-6 times and I used
   the average time of all the runs.

   The times in number of microseconds:

     Connections  multi_perform  select
     1001         3504           951
     2001         7606           1988
     3001         11045          2715
     5001         16406          4024
     9001         32147          8030

   Test system
     CPU:     Athlon XP 2800
     RAM:     1 GB
     Linux:   2.6
     glibc:   2.3.5
     libcurl: 7.15.1

   The only reason I stopped at 9001 connections is that my test machine ran
   out of avaiable memory by then as I ran the test server on the same machine,
   and I didn't want to risk the test result accuracy by having it start using
   the swap during the tests.

   It means that at 9000 connections we spend 40ms for each socket action, even
   when only one socket ever have action.

   With these 32000 microseconds curl_multi_perform() takes for 9000
   connections, it loops 18000 laps which makes less than 2 microseconds per
   lap. (Of course counting time/laps is an oversimplification, but anyway.)
   Hopefully we should achieve less than 10 microseconds for each call to
   curl_multi_socket() for an active connection.

   The timing graph displayed on the libevent site (duplicated on the hiper
   project page) suggests that libevent is pretty much fixed at 50 microseconds
   (although I don't know what test box was used in their testing, we can
   compare the select()-times from my tests and see that they are at least
   resonably close).

   Summing up, the current ~40 ms spent at 9000 connections could then possibly
   be lowered to something around 60 us!

 Step 2 - Implement curl_multi_socket API

   Most of the design decisions and debates about this new API have already
   been held on the curl-library mailing list a long time ago so I had a basic
   idea on what approach to use. The main ideas of the new API are simply:

    1 - The application can use whatever event system it likes as it gets info
        from libcurl about what file descriptors libcurl waits for what action
        on. (The previous API returns fd_sets which is very select()-centric).

    2 - When the application discovers action on a single socket, it calls
        libcurl and informs that there was action on this particular socket and
        libcurl can then act on that socket/transfer only and not care about
        any other transfers. (The previous API always had to scan through all
        the existing transfers.)

   The idea is that curl_multi_socket() calls a given callback with information
   about what socket to wait for what action on, and the callback only gets
   called if the status of that socket has changed.

   In the API draft from before, we have a timeout argument on a per socket
   basis and we also allowed curl_multi_socket() to pass in an 'easy handle'
   instead of socket to allow libcurl to shortcut a lookup and work on the
   affected easy handle right away. Both these turned out to be bad ideas.

   The timeout argument was removed from the socket callback since after much
   thinking I came to the conclusion that we really don't want to handle
   timeouts on a per socket basis. We need it on a per transfer (easy handle)
   basis and thus we can't provide it in the callbacks in a nice way. Instead,
   we have to offer a curl_multi_timeout() that returns the largest amount of
   time we should wait before we call the "timeout action" of libcurl, to
   trigger the proper internal timeout action on the affected transfer. To get
   this to work, I added a struct to each easy handle in which we store an
   "expire time" (if any). The structs are then "splay sorted" so that we can
   add and remove times from the linked list and yet somewhat swiftly figure
   out 1 - how long time there is until the next timer expires and 2 - which
   timer (handle) should we take care of now. Of course, the upside of all this
   is that we get a curl_multi_timeout() that should also work with old-style
   applications that use curl_multi_perform().

   The easy handle argument was removed fom the curl_multi_socket() function
   because having it there would require the application to do a socket to easy
   handle conversion on its own. I find it very unlikely that applications
   would want to do that and since libcurl would need such a lookup on its own
   anyway since we didn't want to force applications to do that translation
   code (it would be optional), it seemed like an unnecessary option. I also
   realized that when we use underlying libraries such as c-ares (for DNS
   asynch resolving) there might in fact be more than one transfer waiting for
   action on the same socket and thus it makes the lookup even tricker and even
   less likely to ever get done by applications. Instead I created an internal
   "socket to easy handles" hash table that given a socket (file descriptor)
   returns a list of easy handles that waits for some action on that socket.

   To make libcurl be able to report plain sockets in the socket callback, I
   had to re-organize the internals of the curl_multi_fdset() etc so that the
   conversion from sockets to fd_sets for that function is only done in the
   last step before the data is returned. I also had to extend c-ares to get a
   function that can return plain sockets, as that library too returned only
   fd_sets and that is no longer good enough. The changes done to c-ares have
   been committed and are available in the c-ares CVS repository destined to be
   included in the upcoming c-ares 1.3.1 release.

   The 'shiper' tool is the test application I wrote that uses the new
   curl_multi_socket() in its current state. It seems to be working and it uses
   the API as it is documented and supposed to work. It is still using
   select(), because I needed that during development (like until I had the
   socket hash implemented etc) and because I haven't yet learned how to use
   libevent or similar.

   The hiper/shiper tools are very simple and initiates lots of connections and
   have them running for the test period and then kills them all.

   Since I wasn't done with the implementation until early January I haven't
   had time to run very many measurements and checks, but I have done a few
   runs with up to a few hundred connections (with a single active one). The
   curl_multi_socket() invoke then takes 3-6 microseconds in average (using the
   read-only-1-byte-at-a-time hack). If this number does increase a lot when we
   add connections, it certainly matches my in my opinion very ambitious goal.
   We are now below the 60 microseconds "per socket action" goal. It is
   destined to be somewhat higher the more connections we have since the hash
   table gets more populated and the splay tree will grow etc.

   Some tests at 7000 and 9000 connections showed that the socket hash lookup
   is somewhat of a bottle neck. Its current implementation may be a bit too
   limiting. It simply has a fixed-size array, and on each entry in the array
   it has a linked list with entries. So the hash only checks which list to
   scan through. The code I had used so for used a list with merely 7 slots (as
   that is what the DNS hash uses) but with 7000 connections that would make an
   average of 1000 nodes in each list to run through. I upped that to 97 slots
   (I believe a prime is suitable) and noticed a significant speed increase.  I
   need to reconsider the hash implementation or use a rather large default
   value like this. At 9000 connections I was still below 10us per call.

 Status Right Now

   The curl_multi_socket() API is implemented according to how it is
   documented. The man pages for curl_multi_socket and curl_multi_timeout are
   both committed to CVS and are available online for easy browsing:

     http://curl.haxx.se/libcurl/c/curl_multi_socket.html
     http://curl.haxx.se/libcurl/c/curl_multi_timeout.html

   The hiper-5.patch I made available early morning January 5th, 2006 should
   apply fine on a recent CVS checkout (at the time of this writing curl 7.15.1
   is the latest public curl release but the hiper patch does not apply fine on
   that).

 What is Left for the curl_multi_socket API

   1 - More measuring with more extreme number of connections

   2 - More testing with actual URLs and complete from start to end transfers.

   I'm quite sure we don't set expire times all over in the code properly, so
   there is bound to be some timeout bugs left.

   What it really takes is for me to commit the code and to make an official
   release with it so that we get people "out there" to help out testing it.

 What is Left for project Hiper

   1 - Add HTTP pipelining support

   2 - Add a zero (or at least close to zero) copy interface

   Neither of these points have been planned or detailed exactly how they will
   be implemented.

 Roadmap Ahead

   I plan and hope to return to full-time hiper work later on this spring or
   possibly summer to continue where I pause now. Of course some spare time
   might also be spent until then to get us moving forward.

 ---------------------------------------------------------------------------

 April 11, 2006

  While sitting staring on my screen trying to write up a *nice* sample script
  using libevent, it strikes me that since libevent is pretty much based around
  its structs that you setup for each event/file descriptor, my application
  wants to figure out the correct struct that is associted with the file
  descriptor that libcurl provides in the socket callback.

  This feels like an operation most applications will need when using the
  multi_socket API, so it feels like I should better try to figure out a decent
  way to offer this basic functionality already in libcurl - and the fact that
  we already have the file descriptors in a hash we can probably just as well
  extend it somewhat and store some custom pointers as well.

  We need to offer the app a way to set a private pointer to be associated with
  the particular file descriptor, and then be able to provide that pointer on
  subsequent callback calls.

 ---------------------------------------------------------------------------

 April 20, 2006

  I was wrong when I previously claimed we could have more than one easy handle
  using the same socket. I've cleaned up and simplified code now to adjust to
  this.

 ---------------------------------------------------------------------------

 July 9, 2006

  TODO: We need to alter how we use c-ares for getting info about its sockets,
  as c-ares now provides a callback approach very similar to how libcurl is
  about to work.

  I'm adding a function called curl_multi_assign() that will set a private
  pointer added to the internal libcurl hash table for the particular socket
  passed in to this function:

  CURLMcode curl_multi_assign(CURLM *multi_handle,
                              curl_socket_t sockfd,
                              void *sockp);

  'sockp' being a custom pointer set by the application to be associated with
  this socket. The socket has to be already existing and in-use by libcurl,
  like having already called the callback telling about its existance.

  The set hashp pointer will then be passed on to the callback in upcoming
  calls when this same socket is used (in the brand new 'socketp' argument).

 ---------------------------------------------------------------------------

 July 30, 2006

  Shockingly stupid (of me not having realized this before), but we really need
  to add a 'running_handles' argument to the curl_multi_socket() and
  curl_multi_socket_all() prototypes so that the caller can get to know when
  all the transfers are actually done!
	Date: January 5, 2006
	Author: Daniel Stenberg

	Status of project Hiper - high performance libcurl modifications
	================================================================

	What is Hiper

	You won't find such a description in this document. See
	http://curl.haxx.se/libcurl/hiper/ for further details.

	Live Progress Info

	During my work, I've posted occational updates on the curl-library mailing
	list but more importantly done frequent updates of
	http://curl.haxx.se/libcurl/hiper/schedule.html

	Schedule

	I took time off my regular job during Decemember 2005 and the first week of
	January 2006 to work on hiper full-time.

	Step 1 - Measure the Existing Solution

	I started full-time work on project Hiper on December 1st 2005. I began by
	putting together a test application that used the existing API to allow me
	to properly and with accuracy measure execution and transfer speeds when
	doing a large amount of transfers.

	I soon discovered that it was impossible to do any sensible measurements by
	using live and actual URLs since the transfers were too unrelialble and
	uncontrolled. I then enhanced the current HTTP server in the curl test suite
	and made that support a large amount of transfers and some extra magic
	"commands" that would make the server either just sit "idle" or "stream"
	(continuously sending data in a never-ending stream). I then wrote up two
	files using the curl test suite file format and by acessing the properly
	formatted URLs on my localhost the HTTP server would either run "idle" or
	run "stream".

	Having this working, I patched libcurl to always only recv() a single byte
	off the network each time, just to make sure that the time spent on reading
	data is constant and never very long.

	I adjusted the test application (actually called 'hiper') to create Y idle
	transfers and Z stream transfers, had it run for N seconds and then quit and
	produce a summary on stdout. Now I got very solid and repeatable results. I
	started to run repeated tests and save the results when I ran into the
	dreaded 1024 socket maximum limit.

	One side of the problem is that the fd_set type only allows 1024 file
	descriptors (on my Linux), which I had to solve by simply making my own type
	with room for more connections and do ugly typecasts in the code. The other
	side of the problem is that user applications have a limit imposed by the
	system on the maximum amount of file descriptors it can have open and I had
	to work around that by writing a special tool that runs setuid root that
	increases the limit, downgrades to a normal user again and then run the
	command line of your choice. This second approach has to be used for both
	'hiper' and the test HTTP server. (You need to build the HTTP server with
	CURL_SWS_FORK_ENABLED defined to have it do forks since it isn't desirable
	to do so when running the normal curl tests.)

	Now I could run my test program without problems. I decided to run the tests
	with 1 stream connection and a varying amount of idle ones. I did 1001,
	2001, 3001, 5001 and 9001 connections and measured how long select() and
	curl_multi_perform() (including the curl_multi_fdset() call) would take in
	average, over a period of 20 seconds. I ran each test 5-6 times and I used
	the average time of all the runs.

	The times in number of microseconds:

	Connections multi_perform select
	1001 3504 951
	2001 7606 1988
	3001 11045 2715
	5001 16406 4024
	9001 32147 8030

	Test system
	CPU: Athlon XP 2800
	RAM: 1 GB
	Linux: 2.6
	glibc: 2.3.5
	libcurl: 7.15.1

	The only reason I stopped at 9001 connections is that my test machine ran
	out of avaiable memory by then as I ran the test server on the same machine,
	and I didn't want to risk the test result accuracy by having it start using
	the swap during the tests.

	It means that at 9000 connections we spend 40ms for each socket action, even
	when only one socket ever have action.

	With these 32000 microseconds curl_multi_perform() takes for 9000
	connections, it loops 18000 laps which makes less than 2 microseconds per
	lap. (Of course counting time/laps is an oversimplification, but anyway.)
	Hopefully we should achieve less than 10 microseconds for each call to
	curl_multi_socket() for an active connection.

	The timing graph displayed on the libevent site (duplicated on the hiper
	project page) suggests that libevent is pretty much fixed at 50 microseconds
	(although I don't know what test box was used in their testing, we can
	compare the select()-times from my tests and see that they are at least
	resonably close).

	Summing up, the current ~40 ms spent at 9000 connections could then possibly
	be lowered to something around 60 us!

	Step 2 - Implement curl_multi_socket API

	Most of the design decisions and debates about this new API have already
	been held on the curl-library mailing list a long time ago so I had a basic
	idea on what approach to use. The main ideas of the new API are simply:

	1 - The application can use whatever event system it likes as it gets info
	from libcurl about what file descriptors libcurl waits for what action
	on. (The previous API returns fd_sets which is very select()-centric).

	2 - When the application discovers action on a single socket, it calls
	libcurl and informs that there was action on this particular socket and
	libcurl can then act on that socket/transfer only and not care about
	any other transfers. (The previous API always had to scan through all
	the existing transfers.)

	The idea is that curl_multi_socket() calls a given callback with information
	about what socket to wait for what action on, and the callback only gets
	called if the status of that socket has changed.

	In the API draft from before, we have a timeout argument on a per socket
	basis and we also allowed curl_multi_socket() to pass in an 'easy handle'
	instead of socket to allow libcurl to shortcut a lookup and work on the
	affected easy handle right away. Both these turned out to be bad ideas.

	The timeout argument was removed from the socket callback since after much
	thinking I came to the conclusion that we really don't want to handle
	timeouts on a per socket basis. We need it on a per transfer (easy handle)
	basis and thus we can't provide it in the callbacks in a nice way. Instead,
	we have to offer a curl_multi_timeout() that returns the largest amount of
	time we should wait before we call the "timeout action" of libcurl, to
	trigger the proper internal timeout action on the affected transfer. To get
	this to work, I added a struct to each easy handle in which we store an
	"expire time" (if any). The structs are then "splay sorted" so that we can
	add and remove times from the linked list and yet somewhat swiftly figure
	out 1 - how long time there is until the next timer expires and 2 - which
	timer (handle) should we take care of now. Of course, the upside of all this
	is that we get a curl_multi_timeout() that should also work with old-style
	applications that use curl_multi_perform().

	The easy handle argument was removed fom the curl_multi_socket() function
	because having it there would require the application to do a socket to easy
	handle conversion on its own. I find it very unlikely that applications
	would want to do that and since libcurl would need such a lookup on its own
	anyway since we didn't want to force applications to do that translation
	code (it would be optional), it seemed like an unnecessary option. I also
	realized that when we use underlying libraries such as c-ares (for DNS
	asynch resolving) there might in fact be more than one transfer waiting for
	action on the same socket and thus it makes the lookup even tricker and even
	less likely to ever get done by applications. Instead I created an internal
	"socket to easy handles" hash table that given a socket (file descriptor)
	returns a list of easy handles that waits for some action on that socket.

	To make libcurl be able to report plain sockets in the socket callback, I
	had to re-organize the internals of the curl_multi_fdset() etc so that the
	conversion from sockets to fd_sets for that function is only done in the
	last step before the data is returned. I also had to extend c-ares to get a
	function that can return plain sockets, as that library too returned only
	fd_sets and that is no longer good enough. The changes done to c-ares have
	been committed and are available in the c-ares CVS repository destined to be
	included in the upcoming c-ares 1.3.1 release.

	The 'shiper' tool is the test application I wrote that uses the new
	curl_multi_socket() in its current state. It seems to be working and it uses
	the API as it is documented and supposed to work. It is still using
	select(), because I needed that during development (like until I had the
	socket hash implemented etc) and because I haven't yet learned how to use
	libevent or similar.

	The hiper/shiper tools are very simple and initiates lots of connections and
	have them running for the test period and then kills them all.

	Since I wasn't done with the implementation until early January I haven't
	had time to run very many measurements and checks, but I have done a few
	runs with up to a few hundred connections (with a single active one). The
	curl_multi_socket() invoke then takes 3-6 microseconds in average (using the
	read-only-1-byte-at-a-time hack). If this number does increase a lot when we
	add connections, it certainly matches my in my opinion very ambitious goal.
	We are now below the 60 microseconds "per socket action" goal. It is
	destined to be somewhat higher the more connections we have since the hash
	table gets more populated and the splay tree will grow etc.

	Some tests at 7000 and 9000 connections showed that the socket hash lookup
	is somewhat of a bottle neck. Its current implementation may be a bit too
	limiting. It simply has a fixed-size array, and on each entry in the array
	it has a linked list with entries. So the hash only checks which list to
	scan through. The code I had used so for used a list with merely 7 slots (as
	that is what the DNS hash uses) but with 7000 connections that would make an
	average of 1000 nodes in each list to run through. I upped that to 97 slots
	(I believe a prime is suitable) and noticed a significant speed increase. I
	need to reconsider the hash implementation or use a rather large default
	value like this. At 9000 connections I was still below 10us per call.

	Status Right Now

	The curl_multi_socket() API is implemented according to how it is
	documented. The man pages for curl_multi_socket and curl_multi_timeout are
	both committed to CVS and are available online for easy browsing:

	http://curl.haxx.se/libcurl/c/curl_multi_socket.html
	http://curl.haxx.se/libcurl/c/curl_multi_timeout.html

	The hiper-5.patch I made available early morning January 5th, 2006 should
	apply fine on a recent CVS checkout (at the time of this writing curl 7.15.1
	is the latest public curl release but the hiper patch does not apply fine on
	that).

	What is Left for the curl_multi_socket API

	1 - More measuring with more extreme number of connections

	2 - More testing with actual URLs and complete from start to end transfers.

	I'm quite sure we don't set expire times all over in the code properly, so
	there is bound to be some timeout bugs left.

	What it really takes is for me to commit the code and to make an official
	release with it so that we get people "out there" to help out testing it.

	What is Left for project Hiper

	1 - Add HTTP pipelining support

	2 - Add a zero (or at least close to zero) copy interface

	Neither of these points have been planned or detailed exactly how they will
	be implemented.

	Roadmap Ahead

	I plan and hope to return to full-time hiper work later on this spring or
	possibly summer to continue where I pause now. Of course some spare time
	might also be spent until then to get us moving forward.

	---------------------------------------------------------------------------

	April 11, 2006

	While sitting staring on my screen trying to write up a nice sample script
	using libevent, it strikes me that since libevent is pretty much based around
	its structs that you setup for each event/file descriptor, my application
	wants to figure out the correct struct that is associted with the file
	descriptor that libcurl provides in the socket callback.

	This feels like an operation most applications will need when using the
	multi_socket API, so it feels like I should better try to figure out a decent
	way to offer this basic functionality already in libcurl - and the fact that
	we already have the file descriptors in a hash we can probably just as well
	extend it somewhat and store some custom pointers as well.

	We need to offer the app a way to set a private pointer to be associated with
	the particular file descriptor, and then be able to provide that pointer on
	subsequent callback calls.

	---------------------------------------------------------------------------

	April 20, 2006

	I was wrong when I previously claimed we could have more than one easy handle
	using the same socket. I've cleaned up and simplified code now to adjust to
	this.

	---------------------------------------------------------------------------

	July 9, 2006

	TODO: We need to alter how we use c-ares for getting info about its sockets,
	as c-ares now provides a callback approach very similar to how libcurl is
	about to work.

	I'm adding a function called curl_multi_assign() that will set a private
	pointer added to the internal libcurl hash table for the particular socket
	passed in to this function:

	CURLMcode curl_multi_assign(CURLM *multi_handle,
	curl_socket_t sockfd,
	void *sockp);

	'sockp' being a custom pointer set by the application to be associated with
	this socket. The socket has to be already existing and in-use by libcurl,
	like having already called the callback telling about its existance.

	The set hashp pointer will then be passed on to the callback in upcoming
	calls when this same socket is used (in the brand new 'socketp' argument).

	---------------------------------------------------------------------------

	July 30, 2006

	Shockingly stupid (of me not having realized this before), but we really need
	to add a 'running_handles' argument to the curl_multi_socket() and
	curl_multi_socket_all() prototypes so that the caller can get to know when
	all the transfers are actually done!