| $Id$ |
| _ _ ____ _ |
| ___| | | | _ \| | |
| / __| | | | |_) | | |
| | (__| |_| | _ <| |___ |
| \___|\___/|_| \_\_____| |
| |
| PROGRAMMING WITH LIBCURL |
| |
| About this Document |
| |
| This document will attempt to describe the general principle and some basic |
| approaches to consider when programming with libcurl. The text will focus |
| mainly on the C/C++ interface but might apply fairly well on other interfaces |
| as well as they usually follow the C one pretty closely. |
| |
| This document will refer to 'the user' as the person writing the source code |
| that uses libcurl. That would probably be you or someone in your position. |
| What will be generally refered to as 'the program' will be the collected |
| source code that you write that is using libcurl for transfers. The program |
| is outside libcurl and libcurl is outside of the program. |
| |
| |
| Building |
| |
| There are many different ways to build C programs. This chapter will assume |
| a unix-style build process. If you use a different build system, you can |
| still read this to get general information that may apply to your |
| environment as well. |
| |
| Compiling the Program |
| |
| Your compiler needs to know where the libcurl headers are |
| located. Therefore you must set your compiler's include path to point to |
| the directory where you installed them. The 'curl-config'[3] tool can be |
| used to get this information: |
| |
| $ curl-config --cflags |
| |
| Linking the Program with libcurl |
| |
| When having compiled the program, you need to link your object files to |
| create a single executable. For that to succeed, you need to link with |
| libcurl and possibly also with other libraries that libcurl itself depends |
| on. Like OpenSSL librararies, but even some standard OS libraries may be |
| needed on the command line. To figure out which flags to use, once again |
| the 'curl-config' tool comes to the rescue: |
| |
| $ curl-config --libs |
| |
| SSL or Not |
| |
| libcurl can be built and customized in many ways. One of the things that |
| varies from different libraries and builds is the support for SSL-based |
| transfers, like HTTPS and FTPS. If OpenSSL was detected properly at |
| build-time, libcurl will be built with SSL support. To figure out if an |
| installed libcurl has been built with SSL support enabled, use |
| 'curl-config' like this: |
| |
| $ curl-config --feature |
| |
| And if SSL is supported, the keyword 'SSL' will be written to stdout, |
| possibly together with a few other features that can be on and off on |
| different libcurls. |
| |
| |
| Portable Code in a Portable World |
| |
| The people behind libcurl have put a considerable effort to make libcurl work |
| on a large amount of different operating systems and environments. |
| |
| You program libcurl the same way on all platforms that libcurl runs on. There |
| are only very few minor considerations that differs. If you just make sure to |
| write your code portable enough, you may very well create yourself a very |
| portable program. libcurl shouldn't stop you from that. |
| |
| |
| Global Preparation |
| |
| The program must initialize some of the libcurl functionality globally. That |
| means it should be done exactly once, no matter how many times you intend to |
| use the library. Once for your program's entire life time. This is done using |
| |
| curl_global_init() |
| |
| and it takes one parameter which is a bit pattern that tells libcurl what to |
| intialize. Using CURL_GLOBAL_ALL will make it initialize all known internal |
| sub modules, and might be a good default option. The current two bits that |
| are specified are: |
| |
| CURL_GLOBAL_WIN32 which only does anything on Windows machines. When used on |
| a Windows machine, it'll make libcurl intialize the win32 socket |
| stuff. Without having that initialized properly, your program cannot use |
| sockets properly. You should only do this once for each application, so if |
| your program already does this or of another library in use does it, you |
| should not tell libcurl to do this as well. |
| |
| CURL_GLOBAL_SSL which only does anything on libcurls compiled and built |
| SSL-enabled. On these systems, this will make libcurl init OpenSSL properly |
| for this application. This is only needed to do once for each application so |
| if your program or another library already does this, this bit should not be |
| needed. |
| |
| libcurl has a default protection mechanism that detects if curl_global_init() |
| hasn't been called by the time curl_easy_perform() is called and if that is |
| the case, libcurl runs the function itself with a guessed bit pattern. Please |
| note that depending solely on this is not considered nice nor very good. |
| |
| When the program no longer uses libcurl, it should call |
| curl_global_cleanup(), which is the opposite of the init call. It will then |
| do the reversed operations to cleanup the resources the curl_global_init() |
| call initialized. |
| |
| Repeated calls to curl_global_init() and curl_global_cleanup() should be |
| avoided. They should be called once each. |
| |
| Handle the easy libcurl |
| |
| libcurl version 7 is oriented around the so called easy interface. All |
| operations in the easy interface are prefixed with 'curl_easy'. |
| |
| Future libcurls will also offer the multi interface. More about that |
| interface, what it is targeted for and how to use it is still only debated on |
| the libcurl mailing list and developer web pages. Join up to discuss and |
| figure out! |
| |
| To use the easy interface, you must first create yourself an easy handle. You |
| need one handle for each easy session you want to perform. Basicly, you |
| should use one handle for every thread you plan to use for transferring. You |
| must never share the same handle in multiple threads. |
| |
| Get an easy handle with |
| |
| easyhandle = curl_easy_init(); |
| |
| It returns an easy handle. Using that you proceed to the next step: setting |
| up your preferred actions. A handle is just a logic entity for the upcoming |
| transfer or series of transfers. One of the most basic properties to set in |
| the handle is the URL. You set your preferred URL to transfer with |
| CURLOPT_URL in a manner similar to: |
| |
| curl_easy_setopt(easyhandle, CURLOPT_URL, "http://curl.haxx.se/"); |
| |
| Let's assume for a while that you want to receive data as the URL indentifies |
| a remote resource you want to get here. Since you write a sort of application |
| that needs this transfer, I assume that you would like to get the data passed |
| to you directly instead of simply getting it passed to stdout. So, you write |
| your own function that matches this prototype: |
| |
| size_t write_data(void *buffer, size_t size, size_t nmemb, void *userp); |
| |
| You tell libcurl to pass all data to this function by issuing a function |
| similar to this: |
| |
| curl_easy_setopt(easyhandle, CURLOPT_WRITEFUNCTION, write_data); |
| |
| You can control what data your function get in the forth argument by setting |
| another property: |
| |
| curl_easy_setopt(easyhandle, CURLOPT_FILE, &internal_struct); |
| |
| Using that property, you can easily pass local data between your application |
| and the function that gets invoked by libcurl. libcurl itself won't touch the |
| data you pass with CURLOPT_FILE. |
| |
| libcurl offers its own default internal callback that'll take care of the |
| data if you don't set the callback with CURLOPT_WRITEFUNCTION. It will then |
| simply output the received data to stdout. You can have the default callback |
| write the data to a different file handle by passing a 'FILE *' to a file |
| opened for writing with the CURLOPT_FILE option. |
| |
| Now, we need to take a step back and have a deep breath. Here's one of those |
| rare platform-dependent nitpicks. Did you spot it? On some platforms[2], |
| libcurl won't be able to operate on files opened by the program. Thus, if you |
| use the default callback and pass in a an open file with CURLOPT_FILE, it |
| will crash. You should therefore avoid this to make your program run fine |
| virtually everywhere. |
| |
| There are of course many more options you can set, and we'll get back to a |
| few of them later. Let's instead continue to the actual transfer: |
| |
| success = curl_easy_perform(easyhandle); |
| |
| The curl_easy_perform() will connect to the remote site, do the necessary |
| commands and receive the transfer. Whenever it receives data, it calls the |
| callback function we previously set. The function may get one byte at a time, |
| or it may get many kilobytes at once. libcurl delivers as much as possible as |
| often as possible. Your callback function should return the number of bytes |
| it "took care of". If that is not the exact same amount of bytes that was |
| passed to it, libcurl will abort the operation and return with an error code. |
| |
| When the transfer is complete, the function returns a return code that |
| informs you if it succeeded in its mission or not. If a return code isn't |
| enough for you, you can use the CURLOPT_ERRORBUFFER to point libcurl to a |
| buffer of yours where it'll store a human readable error message as well. |
| |
| If you then want to transfer another file, the handle is ready to be used |
| again. Mind you, it is even preferred that you re-use an existing handle if |
| you intend to make another transfer. libcurl will then attempt to re-use the |
| previous |
| |
| |
| When It Doesn't Work |
| |
| There will always be times when the transfer fails for some reason. You might |
| have set the wrong libcurl option or misunderstood what the libcurl option |
| actually does, or the remote server might return non-standard replies that |
| confuse the library which then confuses your program. |
| |
| There's one golden rule when these things occur: set the CURLOPT_VERBOSE |
| option to TRUE. It'll cause the library to spew out the entire protocol |
| details it sends, some internal info and some received protcol data as well |
| (especially when using FTP). If you're using HTTP, adding the headers in the |
| received output to study is also a clever way to get a better understanding |
| wht the server behaves the way it does. Include headers in the normal body |
| output with CURLOPT_HEADER set TRUE. |
| |
| Of course there are bugs left. We need to get to know about them to be able |
| to fix them, so we're quite dependent on your bug reports! When you do report |
| suspected bugs in libcurl, please include as much details you possibly can: a |
| protocol dump that CURLOPT_VERBOSE produces, library version, as much as |
| possible of your code that uses libcurl, operating system name and version, |
| compiler name and version etc. |
| |
| |
| Upload Data to a Remote Site |
| |
| libcurl tries to keep a protocol independent approach to most transfers, thus |
| uploading to a remote FTP site is very similar to uploading data to a HTTP |
| server with a PUT request. |
| |
| Of course, first you either create an easy handle or you re-use one existing |
| one. Then you set the URL to operate on just like before. This is the remote |
| URL, that we now will upload. |
| |
| Since we write an application, we most likely want libcurl to get the upload |
| data by asking us for it. To make it do that, we set the read callback and |
| the custom pointer libcurl will pass to our read callback. The read callback |
| should have a prototype similar to: |
| |
| size_t function(char *buffer, size_t size, size_t nitems, void *userp); |
| |
| Where buffer is the pointer to a buffer we fill in with data to upload and |
| size*nitems is the size of the buffer. The 'userp' pointer is the custom |
| pointer we set to point to a struct of ours to pass private data between the |
| application and the callback. |
| |
| curl_easy_setopt(easyhandle, CURLOPT_READFUNCTION, read_function); |
| |
| curl_easy_setopt(easyhandle, CURLOPT_INFILE, &filedata); |
| |
| Tell libcurl that we want to upload: |
| |
| curl_easy_setopt(easyhandle, CURLOPT_UPLOAD, TRUE); |
| |
| A few protocols won't behave properly when uploads are done without any prior |
| knowledge of the expected file size. HTTP PUT is one example [1]. So, set the |
| upload file size using the CURLOPT_INFILESIZE like this: |
| |
| curl_easy_setopt(easyhandle, CURLOPT_INFILESIZE, file_size); |
| |
| So, then you call curl_easy_perform() this time, it'll perform all necessary |
| operations and when it has invoked the upload it'll call your supplied |
| callback to get the data to upload. The program should return as much data as |
| possible in every invoke, as that is likely to make the upload perform as |
| fast as possible. The callback should return the number of bytes it wrote in |
| the buffer. Returning 0 will signal the end of the upload. |
| |
| |
| Passwords |
| |
| Many protocols use or even require that user name and password are provided |
| to be able to download or upload the data of your choice. libcurl offers |
| several ways to specify them. |
| |
| [ URL, options, callback ] |
| |
| |
| Showing Progress |
| |
| |
| libcurl with C++ |
| |
| There's basicly only one thing to keep in mind when using C++ instead of C |
| when interfacing libcurl: |
| |
| "The Callbacks Must Be Plain C" |
| |
| So if you want a write callback set in libcurl, you should put it within |
| 'extern'. Similar to this: |
| |
| extern "C" { |
| size_t write_data(void *ptr, size_t size, size_t nmemb, |
| void *ourpointer) |
| { |
| /* do what you want with the data */ |
| } |
| } |
| |
| This will of course effectively turn the callback code into C. There won't be |
| any "this" pointer available etc. |
| |
| |
| Security Considerations |
| |
| |
| Certificates and Other SSL Tricks |
| |
| |
| Future |
| |
| |
| |
| ----- |
| Footnotes: |
| |
| [1] = HTTP PUT without knowing the size prior to transfer is indeed possible, |
| but libcurl does not support the chunked transfers on uploading that is |
| necessary for this feature to work. We'd gratefully appreciate patches |
| that bring this functionality... |
| |
| [2] = This happens on Windows machines when libcurl is built and used as a |
| DLL. However, you can still do this on Windows if you link with a static |
| library. |
| |
| [3] = The curl-config tool is generated at build-time (on unix-like systems) |
| and should be installed with the 'make install' or similar instruction |
| that installs the library, header files, man pages etc. |