| $Id$ |
| _ _ ____ _ |
| ___| | | | _ \| | |
| / __| | | | |_) | | |
| | (__| |_| | _ <| |___ |
| \___|\___/|_| \_\_____| |
| |
| PROGRAMMING WITH LIBCURL |
| |
| About this Document |
| |
| This document will attempt to describe the general principle and some basic |
| approaches to consider when programming with libcurl. The text will focus |
| mainly on the C interface but might apply fairly well on other interfaces as |
| well as they usually follow the C one pretty closely. |
| |
| This document will refer to 'the user' as the person writing the source code |
| that uses libcurl. That would probably be you or someone in your position. |
| What will be generally refered to as 'the program' will be the collected |
| source code that you write that is using libcurl for transfers. The program |
| is outside libcurl and libcurl is outside of the program. |
| |
| To get the more details on all options and functions described herein, please |
| refer to their respective man pages. |
| |
| Building |
| |
| There are many different ways to build C programs. This chapter will assume a |
| unix-style build process. If you use a different build system, you can still |
| read this to get general information that may apply to your environment as |
| well. |
| |
| Compiling the Program |
| |
| Your compiler needs to know where the libcurl headers are |
| located. Therefore you must set your compiler's include path to point to |
| the directory where you installed them. The 'curl-config'[3] tool can be |
| used to get this information: |
| |
| $ curl-config --cflags |
| |
| Linking the Program with libcurl |
| |
| When having compiled the program, you need to link your object files to |
| create a single executable. For that to succeed, you need to link with |
| libcurl and possibly also with other libraries that libcurl itself depends |
| on. Like OpenSSL librararies, but even some standard OS libraries may be |
| needed on the command line. To figure out which flags to use, once again |
| the 'curl-config' tool comes to the rescue: |
| |
| $ curl-config --libs |
| |
| SSL or Not |
| |
| libcurl can be built and customized in many ways. One of the things that |
| varies from different libraries and builds is the support for SSL-based |
| transfers, like HTTPS and FTPS. If OpenSSL was detected properly at |
| build-time, libcurl will be built with SSL support. To figure out if an |
| installed libcurl has been built with SSL support enabled, use |
| 'curl-config' like this: |
| |
| $ curl-config --feature |
| |
| And if SSL is supported, the keyword 'SSL' will be written to stdout, |
| possibly together with a few other features that can be on and off on |
| different libcurls. |
| |
| |
| Portable Code in a Portable World |
| |
| The people behind libcurl have put a considerable effort to make libcurl work |
| on a large amount of different operating systems and environments. |
| |
| You program libcurl the same way on all platforms that libcurl runs on. There |
| are only very few minor considerations that differs. If you just make sure to |
| write your code portable enough, you may very well create yourself a very |
| portable program. libcurl shouldn't stop you from that. |
| |
| |
| Global Preparation |
| |
| The program must initialize some of the libcurl functionality globally. That |
| means it should be done exactly once, no matter how many times you intend to |
| use the library. Once for your program's entire life time. This is done using |
| |
| curl_global_init() |
| |
| and it takes one parameter which is a bit pattern that tells libcurl what to |
| intialize. Using CURL_GLOBAL_ALL will make it initialize all known internal |
| sub modules, and might be a good default option. The current two bits that |
| are specified are: |
| |
| CURL_GLOBAL_WIN32 which only does anything on Windows machines. When used on |
| a Windows machine, it'll make libcurl intialize the win32 socket |
| stuff. Without having that initialized properly, your program cannot use |
| sockets properly. You should only do this once for each application, so if |
| your program already does this or of another library in use does it, you |
| should not tell libcurl to do this as well. |
| |
| CURL_GLOBAL_SSL which only does anything on libcurls compiled and built |
| SSL-enabled. On these systems, this will make libcurl init OpenSSL properly |
| for this application. This is only needed to do once for each application so |
| if your program or another library already does this, this bit should not be |
| needed. |
| |
| libcurl has a default protection mechanism that detects if curl_global_init() |
| hasn't been called by the time curl_easy_perform() is called and if that is |
| the case, libcurl runs the function itself with a guessed bit pattern. Please |
| note that depending solely on this is not considered nice nor very good. |
| |
| When the program no longer uses libcurl, it should call |
| curl_global_cleanup(), which is the opposite of the init call. It will then |
| do the reversed operations to cleanup the resources the curl_global_init() |
| call initialized. |
| |
| Repeated calls to curl_global_init() and curl_global_cleanup() should be |
| avoided. They should be called once each. |
| |
| Handle the Easy libcurl |
| |
| libcurl version 7 is oriented around the so called easy interface. All |
| operations in the easy interface are prefixed with 'curl_easy'. |
| |
| Future libcurls will also offer the multi interface. More about that |
| interface, what it is targeted for and how to use it is still only debated on |
| the libcurl mailing list and developer web pages. Join up to discuss and |
| figure out! |
| |
| To use the easy interface, you must first create yourself an easy handle. You |
| need one handle for each easy session you want to perform. Basicly, you |
| should use one handle for every thread you plan to use for transferring. You |
| must never share the same handle in multiple threads. |
| |
| Get an easy handle with |
| |
| easyhandle = curl_easy_init(); |
| |
| It returns an easy handle. Using that you proceed to the next step: setting |
| up your preferred actions. A handle is just a logic entity for the upcoming |
| transfer or series of transfers. |
| |
| You set properties and options for this handle using curl_easy_setopt(). They |
| control how the subsequent transfer or transfers will be made. Options remain |
| set in the handle until set again to something different. Alas, multiple |
| requests using the same handle will use the same options. |
| |
| Many of the informationals you set in libcurl are "strings", pointers to data |
| terminated with a zero byte. Keep in mind that when you set strings with |
| curl_easy_setopt(), libcurl will not copy the data. It will merely point to |
| the data. You MUST make sure that the data remains available for libcurl to |
| use until finished or until you use the same option again to point to |
| something else. |
| |
| One of the most basic properties to set in the handle is the URL. You set |
| your preferred URL to transfer with CURLOPT_URL in a manner similar to: |
| |
| curl_easy_setopt(easyhandle, CURLOPT_URL, "http://curl.haxx.se/"); |
| |
| Let's assume for a while that you want to receive data as the URL indentifies |
| a remote resource you want to get here. Since you write a sort of application |
| that needs this transfer, I assume that you would like to get the data passed |
| to you directly instead of simply getting it passed to stdout. So, you write |
| your own function that matches this prototype: |
| |
| size_t write_data(void *buffer, size_t size, size_t nmemb, void *userp); |
| |
| You tell libcurl to pass all data to this function by issuing a function |
| similar to this: |
| |
| curl_easy_setopt(easyhandle, CURLOPT_WRITEFUNCTION, write_data); |
| |
| You can control what data your function get in the forth argument by setting |
| another property: |
| |
| curl_easy_setopt(easyhandle, CURLOPT_FILE, &internal_struct); |
| |
| Using that property, you can easily pass local data between your application |
| and the function that gets invoked by libcurl. libcurl itself won't touch the |
| data you pass with CURLOPT_FILE. |
| |
| libcurl offers its own default internal callback that'll take care of the |
| data if you don't set the callback with CURLOPT_WRITEFUNCTION. It will then |
| simply output the received data to stdout. You can have the default callback |
| write the data to a different file handle by passing a 'FILE *' to a file |
| opened for writing with the CURLOPT_FILE option. |
| |
| Now, we need to take a step back and have a deep breath. Here's one of those |
| rare platform-dependent nitpicks. Did you spot it? On some platforms[2], |
| libcurl won't be able to operate on files opened by the program. Thus, if you |
| use the default callback and pass in a an open file with CURLOPT_FILE, it |
| will crash. You should therefore avoid this to make your program run fine |
| virtually everywhere. |
| |
| There are of course many more options you can set, and we'll get back to a |
| few of them later. Let's instead continue to the actual transfer: |
| |
| success = curl_easy_perform(easyhandle); |
| |
| The curl_easy_perform() will connect to the remote site, do the necessary |
| commands and receive the transfer. Whenever it receives data, it calls the |
| callback function we previously set. The function may get one byte at a time, |
| or it may get many kilobytes at once. libcurl delivers as much as possible as |
| often as possible. Your callback function should return the number of bytes |
| it "took care of". If that is not the exact same amount of bytes that was |
| passed to it, libcurl will abort the operation and return with an error code. |
| |
| When the transfer is complete, the function returns a return code that |
| informs you if it succeeded in its mission or not. If a return code isn't |
| enough for you, you can use the CURLOPT_ERRORBUFFER to point libcurl to a |
| buffer of yours where it'll store a human readable error message as well. |
| |
| If you then want to transfer another file, the handle is ready to be used |
| again. Mind you, it is even preferred that you re-use an existing handle if |
| you intend to make another transfer. libcurl will then attempt to re-use the |
| previous |
| |
| |
| When It Doesn't Work |
| |
| There will always be times when the transfer fails for some reason. You might |
| have set the wrong libcurl option or misunderstood what the libcurl option |
| actually does, or the remote server might return non-standard replies that |
| confuse the library which then confuses your program. |
| |
| There's one golden rule when these things occur: set the CURLOPT_VERBOSE |
| option to TRUE. It'll cause the library to spew out the entire protocol |
| details it sends, some internal info and some received protcol data as well |
| (especially when using FTP). If you're using HTTP, adding the headers in the |
| received output to study is also a clever way to get a better understanding |
| wht the server behaves the way it does. Include headers in the normal body |
| output with CURLOPT_HEADER set TRUE. |
| |
| Of course there are bugs left. We need to get to know about them to be able |
| to fix them, so we're quite dependent on your bug reports! When you do report |
| suspected bugs in libcurl, please include as much details you possibly can: a |
| protocol dump that CURLOPT_VERBOSE produces, library version, as much as |
| possible of your code that uses libcurl, operating system name and version, |
| compiler name and version etc. |
| |
| Getting some in-depth knowledge about the protocols involved is never wrong, |
| and if you're trying to do funny things, you might very well understand |
| libcurl and how to use it better if you study the appropriate RFC documents |
| at least briefly. |
| |
| |
| Upload Data to a Remote Site |
| |
| libcurl tries to keep a protocol independent approach to most transfers, thus |
| uploading to a remote FTP site is very similar to uploading data to a HTTP |
| server with a PUT request. |
| |
| Of course, first you either create an easy handle or you re-use one existing |
| one. Then you set the URL to operate on just like before. This is the remote |
| URL, that we now will upload. |
| |
| Since we write an application, we most likely want libcurl to get the upload |
| data by asking us for it. To make it do that, we set the read callback and |
| the custom pointer libcurl will pass to our read callback. The read callback |
| should have a prototype similar to: |
| |
| size_t function(char *bufptr, size_t size, size_t nitems, void *userp); |
| |
| Where bufptr is the pointer to a buffer we fill in with data to upload and |
| size*nitems is the size of the buffer and therefore also the maximum amount |
| of data we can return to libcurl in this call. The 'userp' pointer is the |
| custom pointer we set to point to a struct of ours to pass private data |
| between the application and the callback. |
| |
| curl_easy_setopt(easyhandle, CURLOPT_READFUNCTION, read_function); |
| |
| curl_easy_setopt(easyhandle, CURLOPT_INFILE, &filedata); |
| |
| Tell libcurl that we want to upload: |
| |
| curl_easy_setopt(easyhandle, CURLOPT_UPLOAD, TRUE); |
| |
| A few protocols won't behave properly when uploads are done without any prior |
| knowledge of the expected file size. HTTP PUT is one example [1]. So, set the |
| upload file size using the CURLOPT_INFILESIZE like this: |
| |
| curl_easy_setopt(easyhandle, CURLOPT_INFILESIZE, file_size); |
| |
| When you call curl_easy_perform() this time, it'll perform all the necessary |
| operations and when it has invoked the upload it'll call your supplied |
| callback to get the data to upload. The program should return as much data as |
| possible in every invoke, as that is likely to make the upload perform as |
| fast as possible. The callback should return the number of bytes it wrote in |
| the buffer. Returning 0 will signal the end of the upload. |
| |
| |
| Passwords |
| |
| Many protocols use or even require that user name and password are provided |
| to be able to download or upload the data of your choice. libcurl offers |
| several ways to specify them. |
| |
| Most protocols support that you specify the name and password in the URL |
| itself. libcurl will detect this and use them accordingly. This is written |
| like this: |
| |
| protocol://user:password@example.com/path/ |
| |
| If you need any odd letters in your user name or password, you should enter |
| them URL encoded, as %XX where XX is a two-digit hexadecimal number. |
| |
| libcurl also provides options to set various passwords. The user name and |
| password as shown embedded in the URL can instead get set with the |
| CURLOPT_USERPWD option. The argument passed to libcurl should be a char * to |
| a string in the format "user:password:". In a manner like this: |
| |
| curl_easy_setopt(easyhandle, CURLOPT_USERPWD, "myname:thesecret"); |
| |
| Another case where name and password might be needed at times, is for those |
| users who need to athenticate themselves to a proxy they use. libcurl offers |
| another option for this, the CURLOPT_PROXYUSERPWD. It is used quite similar |
| to the CURLOPT_USERPWD option like this: |
| |
| curl_easy_setopt(easyhandle, CURLOPT_PROXYUSERPWD, "myname:thesecret"); |
| |
| There's a long time unix "standard" way of storing ftp user names and |
| passwords, namely in the $HOME/.netrc file. The file should be made private |
| so that only the user may read it (see also the "Security Considerations" |
| chapter), as it might contain the password in plain text. libcurl has the |
| ability to use this file to figure out what set of user name and password to |
| use for a particular host. As an extension to the normal functionality, |
| libcurl also supports this file for non-FTP protocols such as HTTP. To make |
| curl use this file, use the CURLOPT_NETRC option: |
| |
| curl_easy_setopt(easyhandle, CURLOPT_NETRC, TRUE); |
| |
| And a very basic example of how such a .netrc file may look like: |
| |
| machine myhost.mydomain.com |
| login userlogin |
| password secretword |
| |
| All these examples have been cases where the password has been optional, or |
| at least you could leave it out and have libcurl attempt to do its job |
| without it. There are times when the password isn't optional, like when |
| you're using an SSL private key for secure transfers. |
| |
| You can in this situation either pass a password to libcurl to use to unlock |
| the private key, or you can let libcurl prompt the user for it. If you prefer |
| to ask the user, then you can provide your own callback function that will be |
| called when libcurl wants the password. That way, you can control how the |
| question will appear to the user. |
| |
| To pass the known private key password to libcurl: |
| |
| curl_easy_setopt(easyhandle, CURLOPT_SSLKEYPASSWD, "keypassword"); |
| |
| To make a password callback: |
| |
| int enter_passwd(void *ourp, const char *prompt, char *buffer, int len); |
| curl_easy_setopt(easyhandle, CURLOPT_PASSWDFUNCTION, enter_passwd); |
| |
| |
| HTTP POSTing |
| |
| We get many questions regarding how to issue HTTP POSTs with libcurl the |
| proper way. This chapter will thus include examples using both different |
| versions of HTTP POST that libcurl supports. |
| |
| The first version is the simple POST, the most common version, that most HTML |
| pages using the <form> tag uses. We provide a pointer to the data and tell |
| libcurl to post it all to the remote site: |
| |
| char *data="name=daniel&project=curl"; |
| curl_easy_setopt(easyhandle, CURLOPT_POSTFIELDS, data); |
| curl_easy_setopt(easyhandle, CURLOPT_URL, "http://posthere.com/"); |
| |
| curl_easy_perform(easyhandle); /* post away! */ |
| |
| Simple enough, huh? Since you set the POST options with the |
| CURLOPT_POSTFIELDS, this automaticly switches the handle to use POST in the |
| upcoming request. |
| |
| Ok, so what if you want to post binary data that also requires you to set the |
| Content-Type: header of the post? Well, binary posts prevents libcurl from |
| being able to do strlen() on the data to figure out the size, so therefore we |
| must tell libcurl the size of the post data. Setting headers in libcurl |
| requests are done in a generic way, by building a list of our own headers and |
| then passing that list to libcurl. |
| |
| struct curl_slist *headers=NULL; |
| headers = curl_slist_append(headers, "Content-Type: text/xml"); |
| |
| /* post binary data */ |
| curl_easy_setopt(easyhandle, CURLOPT_POSTFIELD, binaryptr); |
| |
| /* set the size of the postfields data */ |
| curl_easy_setopt(easyhandle, CURLOPT_POSTFIELDSIZE, 23); |
| |
| /* pass our list of custom made headers */ |
| curl_easy_setopt(easyhandle, CURLOPT_HTTPHEADER, headers); |
| |
| curl_easy_perform(easyhandle); /* post away! */ |
| |
| curl_slist_free_all(headers); /* free the header list */ |
| |
| While the simple examples above cover the majority of all cases where HTTP |
| POST operations are required, they don't do multipart formposts. Multipart |
| formposts were introduced as a better way to post (possibly large) binary |
| data and was first documented in the RFC1867. They're called multipart |
| because they're built by a chain of parts, each being a single unit. Each |
| part has its own name and contents. You can in fact create and post a |
| multipart formpost with the regular libcurl POST support described above, but |
| that would require that you build a formpost yourself and provide to |
| libcurl. To make that easier, libcurl provides curl_formadd(). Using this |
| function, you add parts to the form. When you're done adding parts, you post |
| the whole form. |
| |
| The following example sets two simple text parts with plain textual contents, |
| and then a file with binary contents and upload the whole thing. |
| |
| struct HttpPost *post=NULL; |
| struct HttpPost *last=NULL; |
| curl_formadd(&post, &last, |
| CURLFORM_COPYNAME, "name", |
| CURLFORM_COPYCONTENTS, "daniel", CURLFORM_END); |
| curl_formadd(&post, &last, |
| CURLFORM_COPYNAME, "project", |
| CURLFORM_COPYCONTENTS, "curl", CURLFORM_END); |
| curl_formadd(&post, &last, |
| CURLFORM_COPYNAME, "logotype-image", |
| CURLFORM_FILECONTENT, "curl.png", CURLFORM_END); |
| |
| /* Set the form info */ |
| curl_easy_setopt(easyhandle, CURLOPT_HTTPPOST, post); |
| |
| curl_easy_perform(easyhandle); /* post away! */ |
| |
| /* free the post data again */ |
| curl_formfree(post); |
| |
| Multipart formposts are chains of parts using MIME-style separators and |
| headers. It means that each one of these separate parts get a few headers set |
| that describe the individual content-type, size etc. To enable your |
| application to handicraft this formpost even more, libcurl allows you to |
| supply your own set of custom headers to such an individual form part. You |
| can of course supply headers to as many parts you like, but this little |
| example will show how you set headers to one specific part when you add that |
| to the post handle: |
| |
| struct curl_slist *headers=NULL; |
| headers = curl_slist_append(headers, "Content-Type: text/xml"); |
| |
| curl_formadd(&post, &last, |
| CURLFORM_COPYNAME, "logotype-image", |
| CURLFORM_FILECONTENT, "curl.xml", |
| CURLFORM_CONTENTHEADER, headers, |
| CURLFORM_END); |
| |
| curl_easy_perform(easyhandle); /* post away! */ |
| |
| curl_formfree(post); /* free post */ |
| curl_slist_free_all(post); /* free custom header list */ |
| |
| Since all options on an easyhandle are "sticky", they remain the same until |
| changed even if you do call curl_easy_perform(), you may need to tell curl to |
| go back to a plain GET request if you intend to do such a one as your next |
| request. You force an easyhandle to back to GET by using the CURLOPT_HTTPGET |
| option: |
| |
| curl_easy_setopt(easyhandle, CURLOPT_HTTPGET, TRUE); |
| |
| Just setting CURLOPT_POSTFIELDS to "" or NULL will *not* stop libcurl from |
| doing a POST. It will just make it POST without any data to send! |
| |
| |
| Showing Progress |
| |
| [ built-in progress meter, progress callback ] |
| |
| |
| libcurl with C++ |
| |
| There's basicly only one thing to keep in mind when using C++ instead of C |
| when interfacing libcurl: |
| |
| "The Callbacks Must Be Plain C" |
| |
| So if you want a write callback set in libcurl, you should put it within |
| 'extern'. Similar to this: |
| |
| extern "C" { |
| size_t write_data(void *ptr, size_t size, size_t nmemb, |
| void *ourpointer) |
| { |
| /* do what you want with the data */ |
| } |
| } |
| |
| This will of course effectively turn the callback code into C. There won't be |
| any "this" pointer available etc. |
| |
| |
| Proxies |
| |
| What "proxy" means according to Merriam-Webster: "a person authorized to act |
| for another" but also "the agency, function, or office of a deputy who acts |
| as a substitute for another". |
| |
| Proxies are exceedingly common these days. Companies often only offer |
| internet access to employees through their HTTP proxies. Network clients or |
| user-agents ask the proxy for docuements, the proxy does the actual request |
| and then it returns them. |
| |
| libcurl has full support for HTTP proxies, so when a given URL is wanted, |
| libcurl will ask the proxy for it instead of trying to connect to the actual |
| host identified in the URL. |
| |
| The fact that the proxy is a HTTP proxy puts certain restrictions on what can |
| actually happen. A requested URL that might not be a HTTP URL will be still |
| be passed to the HTTP proxy to deliver back to libcurl. This happens |
| transparantly, and an application may not need to know. I say "may", because |
| at times it is very important to understand that all operations over a HTTP |
| proxy is using the HTTP protocol. For example, you can't invoke your own |
| custom FTP commands or even proper FTP directory listings. |
| |
| Proxy Options |
| |
| To tell libcurl to use a proxy at a given port number: |
| |
| curl_easy_setopt(easyhandle, CURLOPT_PROXY, "proxy-host.com:8080"); |
| |
| Some proxies require user authentication before allowing a request, and |
| you pass that information similar to this: |
| |
| curl_easy_setopt(easyhandle, CURLOPT_PROXYUSERPWD, "user:password"); |
| |
| If you want to, you can specify the host name only in the CURLOPT_PROXY |
| option, and set the port number separately with CURLOPT_PROXYPORT. |
| |
| Environment Variables |
| |
| libcurl automaticly checks and uses a set of environment variables to know |
| what proxies to use for certain protocols. The names of the variables are |
| following an ancient de facto standard and are built up as |
| "[protocol]_proxy" (note the lower casing). Which makes the variable |
| 'http_proxy' checked for a name of a proxy to use when the input URL is |
| HTTP. Following the same rule, the variable named 'ftp_proxy' is checked |
| for FTP URLs. Again, the proxies are always HTTP proxies, the different |
| names of the variables simply allows different HTTP proxies to be used. |
| |
| The proxy environment variable contents should be in the format |
| "[protocol://]machine[:port]". Where the protocol:// part is simply |
| ignored if present (so http://proxy and bluerk://proxy will do the same) |
| and the optional port number specifies on which port the proxy operates on |
| the host. If not specified, the internal default port number will be used |
| and that is most likely *not* the one you would like it to be. |
| |
| There are two special environment variables. 'all_proxy' is what sets |
| proxy for any URL in case the protocol specific variable wasn't set, and |
| 'no_proxy' defines a list of hosts that should not use a proxy even though |
| a variable may say so. If 'no_proxy' is a plain asterisk ("*") it matches |
| all hosts. |
| |
| SSL and Proxies |
| |
| SSL is for secure point-to-point connections. This involves strong |
| encryption and similar things, which effectivly makes it impossible for a |
| proxy to operate as a "man in between" which the proxy's task is, as |
| previously discussed. Instead, the only way to have SSL work over a HTTP |
| proxy is to ask the proxy to tunnel trough everything without being able |
| to check or fiddle with the traffic. |
| |
| Opening an SSL connection over a HTTP proxy is therefor a matter of asking |
| the proxy for a straight connection to the target host on a specified |
| port. This is made with the HTTP request CONNECT. ("please mr proxy, |
| connect me to that remote host"). |
| |
| Because of the nature of this operation, where the proxy has no idea what |
| kind of data that is passed in and out through this tunnel, this breaks |
| some of the very few advantages that come from using a proxy, such as |
| caching. Many organizations prevent this kind of tunneling to other |
| destination port numbers than 443 (which is the default HTTPS port |
| number). |
| |
| Tunneling Through Proxy |
| |
| As explained above, tunneling is required for SSL to work and often even |
| restricted to the operation intended for SSL; HTTPS. |
| |
| This is however not the only time proxy-tunneling might offer benefits to |
| you or your application. |
| |
| As tunneling opens a direct connection from your application to the remote |
| machine, it suddenly also re-introduces the ability to do non-HTTP |
| operations over a HTTP proxy. You can in fact use things such as FTP |
| upload or FTP custom commands this way. |
| |
| Again, this is often prevented by the adminstrators of proxies and is |
| rarely allowed. |
| |
| Tell libcurl to use proxy tunneling like this: |
| |
| curl_easy_setopt(easyhandle, CURLOPT_HTTPPROXYTUNNEL, TRUE); |
| |
| In fact, there might even be times when you want to do plain HTTP |
| operations using a tunnel like this, as it then enables you to operate on |
| the remote server instead of asking the proxy to do so. libcurl will not |
| stand in the way for such innovative actions either! |
| |
| Proxy Auto-Config |
| |
| Netscape first came up with this. It is basicly a web page (usually using |
| a .pac extension) with a javascript that when executed by the browser with |
| the requested URL as input, returns information to the browser on how to |
| connect to the URL. The returned information might be "DIRECT" (which |
| means no proxy should be used), "PROXY host:port" (to tell the browser |
| where the proxy for this particular URL is) or "SOCKS host:port" (to |
| direct the brower to a SOCKS proxy). |
| |
| libcurl has no means to interpret or evaluate javascript and thus it |
| doesn't support this. If you get yourself in a position where you face |
| this nasty invention, the following advice have been mentioned and used in |
| the past: |
| |
| - Depending on the javascript complexity, write up a script that |
| translates it to another language and execute that. |
| |
| - Read the javascript code and rewrite the same logic in another language. |
| |
| - Implement a javascript interpreted, people have successfully used the |
| Mozilla javascript engine in the past. |
| |
| - Ask your admins to stop this, for a static proxy setup or similar. |
| |
| |
| Persistancy Is The Way to Happiness |
| |
| Re-cycling the same easy handle several times when doing multiple requests is |
| the way to go. |
| |
| After each single curl_easy_perform() operation, libcurl will keep the |
| connection alive and open. A subsequent request using the same easy handle to |
| the same host might just be able to use the already open connection! This |
| reduces network impact a lot. |
| |
| Even if the connection is dropped, all connections involving SSL to the same |
| host again, will benefit from libcurl's session ID cache that drasticly |
| reduces re-connection time. |
| |
| FTP connections that are kept alive saves a lot of time, as the command- |
| response roundtrips are skipped, and also you don't risk getting blocked |
| without permission to login again like on many FTP servers only allowing N |
| persons to be logged in at the same time. |
| |
| libcurl caches DNS name resolving results, to make lookups of a previously |
| looked up name a lot faster. |
| |
| Other interesting details that improve performance for subsequent requests |
| may also be added in the future. |
| |
| Each easy handle will attempt to keep the last few connections alive for a |
| while in case they are to be used again. You can set the size of this "cache" |
| with the CURLOPT_MAXCONNECTS option. Default is 5. It is very seldom any |
| point in changing this value, and if you think of changing this it is often |
| just a matter of thinking again. |
| |
| When the connection cache gets filled, libcurl must close an existing |
| connection in order to get room for the new one. To know which connection to |
| close, libcurl uses a "close policy" that you can affect with the |
| CURLOPT_CLOSEPOLICY option. There's only two polices implemented as of this |
| writing (libcurl 7.9.4) and they are: |
| |
| CURLCLOSEPOLICY_LEAST_RECENTLY_USED simply close the one that hasn't been |
| used for the longest time. This is the default behavior. |
| |
| CURLCLOSEPOLICY_OLDEST closes the oldest connection, the one that was |
| createst the longest time ago. |
| |
| There are, or at least were, plans to support a close policy that would call |
| a user-specified callback to let the user be able to decide which connection |
| to dump when this is necessary and therefor is the CURLOPT_CLOSEFUNCTION an |
| existing option still today. Nothing ever uses this though and this will not |
| be used within the forseeable future either. |
| |
| To force your upcoming request to not use an already existing connection (it |
| will even close one first if there happens to be one alive to the same host |
| you're about to operate on), you can do that by setting CURLOPT_FRESH_CONNECT |
| to TRUE. In a similar spirit, you can also forbid the upcoming request to be |
| "lying" around and possibly get re-used after the request by setting |
| CURLOPT_FORBID_REUSE to TRUE. |
| |
| |
| Customizing Operations |
| |
| There is an ongoing development today where more and more protocols are built |
| upon HTTP for transport. This has obvious benefits as HTTP is a tested and |
| reliable protocol that is widely deployed and have excellent proxy-support. |
| |
| When you use one of these protocols, and even when doing other kinds of |
| programming you may need to change the traditional HTTP (or FTP or...) |
| manners. You may need to change words, headers or various data. |
| |
| libcurl is your friend here too. |
| |
| If just changing the actual HTTP request keyword is what you want, like when |
| GET, HEAD or POST is not good enough for you, CURLOPT_CUSTOMREQUEST is there |
| for you. It is very simple to use: |
| |
| curl_easy_setopt(easyhandle, CURLOPT_CUSTOMREQUEST, "MYOWNRUQUEST"); |
| |
| When using the custom request, you change the request keyword of the actual |
| request you are performing. Thus, by default you make GET request but you can |
| also make a POST operation (as described before) and then replace the POST |
| keyword if you want to. You're the boss. |
| |
| HTTP-like protocols pass a series of headers to the server when doing the |
| request, and you're free to pass any amount of extra headers that you think |
| fit. Adding headers are this easy: |
| |
| struct curl_slist *headers; |
| |
| headers = curl_slist_append(headers, "Hey-server-hey: how are you?"); |
| headers = curl_slist_append(headers, "X-silly-content: yes"); |
| |
| /* pass our list of custom made headers */ |
| curl_easy_setopt(easyhandle, CURLOPT_HTTPHEADER, headers); |
| |
| curl_easy_perform(easyhandle); /* transfer http */ |
| |
| curl_slist_free_all(headers); /* free the header list */ |
| |
| ... and if you think some of the internally generated headers, such as |
| User-Agent:, Accept: or Host: don't contain the data you want them to |
| contain, you can replace them by simply setting them too: |
| |
| headers = curl_slist_append(headers, "User-Agent: 007"); |
| headers = curl_slist_append(headers, "Host: munged.host.line"); |
| |
| If you replace an existing header with one with no contents, you will prevent |
| the header from being sent. Like if you want to completely prevent the |
| "Accept:" header to be sent, you can disable it with code similar to this: |
| |
| headers = curl_slist_append(headers, "Accept:"); |
| |
| Both replacing and cancelling internal headers should be done with careful |
| consideration and you should be aware that you may violate the HTTP protocol |
| when doing so. |
| |
| Not all protocols are HTTP-like, and thus the above may not help you when you |
| want to make for example your FTP transfers to behave differently. |
| |
| Sending custom commands to a FTP server means that you need to send the |
| comands exactly as the FTP server expects them (RFC959 is a good guide here), |
| and you can only use commands that work on the control-connection alone. All |
| kinds of commands that requires data interchange and thus needs a |
| data-connection must be left to libcurl's own judgement. Also be aware that |
| libcurl will do its very best to change directory to the target directory |
| before doing any transfer, so if you change directory (with CWD or similar) |
| you might confuse libcurl and then it might not attempt to transfer the file |
| in the correct remote directory. |
| |
| A little example that deletes a given file before an operation: |
| |
| headers = curl_slist_append(headers, "DELE file-to-remove"); |
| |
| /* pass the list of custom commands to the handle */ |
| curl_easy_setopt(easyhandle, CURLOPT_QUOTE, headers); |
| |
| curl_easy_perform(easyhandle); /* transfer ftp data! */ |
| |
| curl_slist_free_all(headers); /* free the header list */ |
| |
| If you would instead want this operation (or chain of operations) to happen |
| _after_ the data transfer took place the option to curl_easy_setopt() would |
| instead be called CURLOPT_POSTQUOTE and used the exact same way. |
| |
| The custom FTP command will be issued to the server in the same order they |
| are built in the list, and if a command gets an error code returned back from |
| the server no more commands will be issued and libcurl will bail out with an |
| error code. Note that if you use CURLOPT_QUOTE to send commands before a |
| transfer, no transfer will actually take place then. |
| |
| [ custom FTP commands without transfer, FTP "header-only", HTTP 1.0 ] |
| |
| Cookies Without Chocolate Chips |
| |
| [ set cookies, read cookies from file, cookie-jar ] |
| |
| |
| Headers Equal Fun |
| |
| [ use the header callback for HTTP, FTP etc ] |
| |
| |
| Post Transfer Information |
| |
| [ curl_easy_getinfo ] |
| |
| |
| Security Considerations |
| |
| [ ps output, netrc plain text, plain text protocols / base64 ] |
| |
| |
| SSL, Certificates and Other Tricks |
| |
| [ seeding, passwords, keys, certificates, ENGINE, ca certs ] |
| |
| |
| Future |
| |
| [ multi interface, sharing between handles, mutexes, pipelining ] |
| |
| |
| ----- |
| Footnotes: |
| |
| [1] = HTTP PUT without knowing the size prior to transfer is indeed possible, |
| but libcurl does not support the chunked transfers on uploading that is |
| necessary for this feature to work. We'd gratefully appreciate patches |
| that bring this functionality... |
| |
| [2] = This happens on Windows machines when libcurl is built and used as a |
| DLL. However, you can still do this on Windows if you link with a static |
| library. |
| |
| [3] = The curl-config tool is generated at build-time (on unix-like systems) |
| and should be installed with the 'make install' or similar instruction |
| that installs the library, header files, man pages etc. |