Merge commit 'origin/master'
[libwww-perl-eserte.git] / lib / LWP.pm
blob469a0ed2b5fa777b3270c60cdcc8070e3ea02694
1 package LWP;
3 $VERSION = "5.820";
4 sub Version { $VERSION; }
6 require 5.005;
7 require LWP::UserAgent; # this should load everything you need
9 1;
11 __END__
13 =head1 NAME
15 LWP - The World-Wide Web library for Perl
17 =head1 SYNOPSIS
19 use LWP;
20 print "This is libwww-perl-$LWP::VERSION\n";
23 =head1 DESCRIPTION
25 The libwww-perl collection is a set of Perl modules which provides a
26 simple and consistent application programming interface (API) to the
27 World-Wide Web. The main focus of the library is to provide classes
28 and functions that allow you to write WWW clients. The library also
29 contain modules that are of more general use and even classes that
30 help you implement simple HTTP servers.
32 Most modules in this library provide an object oriented API. The user
33 agent, requests sent and responses received from the WWW server are
34 all represented by objects. This makes a simple and powerful
35 interface to these services. The interface is easy to extend
36 and customize for your own needs.
38 The main features of the library are:
40 =over 3
42 =item *
44 Contains various reusable components (modules) that can be
45 used separately or together.
47 =item *
49 Provides an object oriented model of HTTP-style communication. Within
50 this framework we currently support access to http, https, gopher, ftp, news,
51 file, and mailto resources.
53 =item *
55 Provides a full object oriented interface or
56 a very simple procedural interface.
58 =item *
60 Supports the basic and digest authorization schemes.
62 =item *
64 Supports transparent redirect handling.
66 =item *
68 Supports access through proxy servers.
70 =item *
72 Provides parser for F<robots.txt> files and a framework for constructing robots.
74 =item *
76 Supports parsing of HTML forms.
78 =item *
80 Implements HTTP content negotiation algorithm that can
81 be used both in protocol modules and in server scripts (like CGI
82 scripts).
84 =item *
86 Supports HTTP cookies.
88 =item *
90 Some simple command line clients, for instance C<lwp-request> and C<lwp-download>.
92 =back
95 =head1 HTTP STYLE COMMUNICATION
98 The libwww-perl library is based on HTTP style communication. This
99 section tries to describe what that means.
101 Let us start with this quote from the HTTP specification document
102 <URL:http://www.w3.org/pub/WWW/Protocols/>:
104 =over 3
106 =item
108 The HTTP protocol is based on a request/response paradigm. A client
109 establishes a connection with a server and sends a request to the
110 server in the form of a request method, URI, and protocol version,
111 followed by a MIME-like message containing request modifiers, client
112 information, and possible body content. The server responds with a
113 status line, including the message's protocol version and a success or
114 error code, followed by a MIME-like message containing server
115 information, entity meta-information, and possible body content.
117 =back
119 What this means to libwww-perl is that communication always take place
120 through these steps: First a I<request> object is created and
121 configured. This object is then passed to a server and we get a
122 I<response> object in return that we can examine. A request is always
123 independent of any previous requests, i.e. the service is stateless.
124 The same simple model is used for any kind of service we want to
125 access.
127 For example, if we want to fetch a document from a remote file server,
128 then we send it a request that contains a name for that document and
129 the response will contain the document itself. If we access a search
130 engine, then the content of the request will contain the query
131 parameters and the response will contain the query result. If we want
132 to send a mail message to somebody then we send a request object which
133 contains our message to the mail server and the response object will
134 contain an acknowledgment that tells us that the message has been
135 accepted and will be forwarded to the recipient(s).
137 It is as simple as that!
140 =head2 The Request Object
142 The libwww-perl request object has the class name C<HTTP::Request>.
143 The fact that the class name uses C<HTTP::> as a
144 prefix only implies that we use the HTTP model of communication. It
145 does not limit the kind of services we can try to pass this I<request>
146 to. For instance, we will send C<HTTP::Request>s both to ftp and
147 gopher servers, as well as to the local file system.
149 The main attributes of the request objects are:
151 =over 3
153 =item *
155 The B<method> is a short string that tells what kind of
156 request this is. The most common methods are B<GET>, B<PUT>,
157 B<POST> and B<HEAD>.
159 =item *
161 The B<uri> is a string denoting the protocol, server and
162 the name of the "document" we want to access. The B<uri> might
163 also encode various other parameters.
165 =item *
167 The B<headers> contain additional information about the
168 request and can also used to describe the content. The headers
169 are a set of keyword/value pairs.
171 =item *
173 The B<content> is an arbitrary amount of data.
175 =back
177 =head2 The Response Object
179 The libwww-perl response object has the class name C<HTTP::Response>.
180 The main attributes of objects of this class are:
182 =over 3
184 =item *
186 The B<code> is a numerical value that indicates the overall
187 outcome of the request.
189 =item *
191 The B<message> is a short, human readable string that
192 corresponds to the I<code>.
194 =item *
196 The B<headers> contain additional information about the
197 response and describe the content.
199 =item *
201 The B<content> is an arbitrary amount of data.
203 =back
205 Since we don't want to handle all possible I<code> values directly in
206 our programs, a libwww-perl response object has methods that can be
207 used to query what kind of response this is. The most commonly used
208 response classification methods are:
210 =over 3
212 =item is_success()
214 The request was was successfully received, understood or accepted.
216 =item is_error()
218 The request failed. The server or the resource might not be
219 available, access to the resource might be denied or other things might
220 have failed for some reason.
222 =back
224 =head2 The User Agent
226 Let us assume that we have created a I<request> object. What do we
227 actually do with it in order to receive a I<response>?
229 The answer is that you pass it to a I<user agent> object and this
230 object takes care of all the things that need to be done
231 (like low-level communication and error handling) and returns
232 a I<response> object. The user agent represents your
233 application on the network and provides you with an interface that
234 can accept I<requests> and return I<responses>.
236 The user agent is an interface layer between
237 your application code and the network. Through this interface you are
238 able to access the various servers on the network.
240 The class name for the user agent is C<LWP::UserAgent>. Every
241 libwww-perl application that wants to communicate should create at
242 least one object of this class. The main method provided by this
243 object is request(). This method takes an C<HTTP::Request> object as
244 argument and (eventually) returns a C<HTTP::Response> object.
246 The user agent has many other attributes that let you
247 configure how it will interact with the network and with your
248 application.
250 =over 3
252 =item *
254 The B<timeout> specifies how much time we give remote servers to
255 respond before the library disconnects and creates an
256 internal I<timeout> response.
258 =item *
260 The B<agent> specifies the name that your application should use when it
261 presents itself on the network.
263 =item *
265 The B<from> attribute can be set to the e-mail address of the person
266 responsible for running the application. If this is set, then the
267 address will be sent to the servers with every request.
269 =item *
271 The B<parse_head> specifies whether we should initialize response
272 headers from the E<lt>head> section of HTML documents.
274 =item *
276 The B<proxy> and B<no_proxy> attributes specify if and when to go through
277 a proxy server. <URL:http://www.w3.org/pub/WWW/Proxies/>
279 =item *
281 The B<credentials> provide a way to set up user names and
282 passwords needed to access certain services.
284 =back
286 Many applications want even more control over how they interact
287 with the network and they get this by sub-classing
288 C<LWP::UserAgent>. The library includes a
289 sub-class, C<LWP::RobotUA>, for robot applications.
291 =head2 An Example
293 This example shows how the user agent, a request and a response are
294 represented in actual perl code:
296 # Create a user agent object
297 use LWP::UserAgent;
298 $ua = LWP::UserAgent->new;
299 $ua->agent("MyApp/0.1 ");
301 # Create a request
302 my $req = HTTP::Request->new(POST => 'http://search.cpan.org/search');
303 $req->content_type('application/x-www-form-urlencoded');
304 $req->content('query=libwww-perl&mode=dist');
306 # Pass request to the user agent and get a response back
307 my $res = $ua->request($req);
309 # Check the outcome of the response
310 if ($res->is_success) {
311 print $res->content;
313 else {
314 print $res->status_line, "\n";
317 The $ua is created once when the application starts up. New request
318 objects should normally created for each request sent.
321 =head1 NETWORK SUPPORT
323 This section discusses the various protocol schemes and
324 the HTTP style methods that headers may be used for each.
326 For all requests, a "User-Agent" header is added and initialized from
327 the $ua->agent attribute before the request is handed to the network
328 layer. In the same way, a "From" header is initialized from the
329 $ua->from attribute.
331 For all responses, the library adds a header called "Client-Date".
332 This header holds the time when the response was received by
333 your application. The format and semantics of the header are the
334 same as the server created "Date" header. You may also encounter other
335 "Client-XXX" headers. They are all generated by the library
336 internally and are not received from the servers.
338 =head2 HTTP Requests
340 HTTP requests are just handed off to an HTTP server and it
341 decides what happens. Few servers implement methods beside the usual
342 "GET", "HEAD", "POST" and "PUT", but CGI-scripts may implement
343 any method they like.
345 If the server is not available then the library will generate an
346 internal error response.
348 The library automatically adds a "Host" and a "Content-Length" header
349 to the HTTP request before it is sent over the network.
351 For a GET request you might want to add a "If-Modified-Since" or
352 "If-None-Match" header to make the request conditional.
354 For a POST request you should add the "Content-Type" header. When you
355 try to emulate HTML E<lt>FORM> handling you should usually let the value
356 of the "Content-Type" header be "application/x-www-form-urlencoded".
357 See L<lwpcook> for examples of this.
359 The libwww-perl HTTP implementation currently support the HTTP/1.1
360 and HTTP/1.0 protocol.
362 The library allows you to access proxy server through HTTP. This
363 means that you can set up the library to forward all types of request
364 through the HTTP protocol module. See L<LWP::UserAgent> for
365 documentation of this.
368 =head2 HTTPS Requests
370 HTTPS requests are HTTP requests over an encrypted network connection
371 using the SSL protocol developed by Netscape. Everything about HTTP
372 requests above also apply to HTTPS requests. In addition the library
373 will add the headers "Client-SSL-Cipher", "Client-SSL-Cert-Subject" and
374 "Client-SSL-Cert-Issuer" to the response. These headers denote the
375 encryption method used and the name of the server owner.
377 The request can contain the header "If-SSL-Cert-Subject" in order to
378 make the request conditional on the content of the server certificate.
379 If the certificate subject does not match, no request is sent to the
380 server and an internally generated error response is returned. The
381 value of the "If-SSL-Cert-Subject" header is interpreted as a Perl
382 regular expression.
385 =head2 FTP Requests
387 The library currently supports GET, HEAD and PUT requests. GET
388 retrieves a file or a directory listing from an FTP server. PUT
389 stores a file on a ftp server.
391 You can specify a ftp account for servers that want this in addition
392 to user name and password. This is specified by including an "Account"
393 header in the request.
395 User name/password can be specified using basic authorization or be
396 encoded in the URL. Failed logins return an UNAUTHORIZED response with
397 "WWW-Authenticate: Basic" and can be treated like basic authorization
398 for HTTP.
400 The library supports ftp ASCII transfer mode by specifying the "type=a"
401 parameter in the URL. It also supports transfer of ranges for FTP transfers
402 using the "Range" header.
404 Directory listings are by default returned unprocessed (as returned
405 from the ftp server) with the content media type reported to be
406 "text/ftp-dir-listing". The C<File::Listing> module provides methods
407 for parsing of these directory listing.
409 The ftp module is also able to convert directory listings to HTML and
410 this can be requested via the standard HTTP content negotiation
411 mechanisms (add an "Accept: text/html" header in the request if you
412 want this).
414 For normal file retrievals, the "Content-Type" is guessed based on the
415 file name suffix. See L<LWP::MediaTypes>.
417 The "If-Modified-Since" request header works for servers that implement
418 the MDTM command. It will probably not work for directory listings though.
420 Example:
422 $req = HTTP::Request->new(GET => 'ftp://me:passwd@ftp.some.where.com/');
423 $req->header(Accept => "text/html, */*;q=0.1");
425 =head2 News Requests
427 Access to the USENET News system is implemented through the NNTP
428 protocol. The name of the news server is obtained from the
429 NNTP_SERVER environment variable and defaults to "news". It is not
430 possible to specify the hostname of the NNTP server in news: URLs.
432 The library supports GET and HEAD to retrieve news articles through the
433 NNTP protocol. You can also post articles to newsgroups by using
434 (surprise!) the POST method.
436 GET on newsgroups is not implemented yet.
438 Examples:
440 $req = HTTP::Request->new(GET => 'news:abc1234@a.sn.no');
442 $req = HTTP::Request->new(POST => 'news:comp.lang.perl.test');
443 $req->header(Subject => 'This is a test',
444 From => 'me@some.where.org');
445 $req->content(<<EOT);
446 This is the content of the message that we are sending to
447 the world.
451 =head2 Gopher Request
453 The library supports the GET and HEAD methods for gopher requests. All
454 request header values are ignored. HEAD cheats and returns a
455 response without even talking to server.
457 Gopher menus are always converted to HTML.
459 The response "Content-Type" is generated from the document type
460 encoded (as the first letter) in the request URL path itself.
462 Example:
464 $req = HTTP::Request->new(GET => 'gopher://gopher.sn.no/');
468 =head2 File Request
470 The library supports GET and HEAD methods for file requests. The
471 "If-Modified-Since" header is supported. All other headers are
472 ignored. The I<host> component of the file URL must be empty or set
473 to "localhost". Any other I<host> value will be treated as an error.
475 Directories are always converted to an HTML document. For normal
476 files, the "Content-Type" and "Content-Encoding" in the response are
477 guessed based on the file suffix.
479 Example:
481 $req = HTTP::Request->new(GET => 'file:/etc/passwd');
484 =head2 Mailto Request
486 You can send (aka "POST") mail messages using the library. All
487 headers specified for the request are passed on to the mail system.
488 The "To" header is initialized from the mail address in the URL.
490 Example:
492 $req = HTTP::Request->new(POST => 'mailto:libwww@perl.org');
493 $req->header(Subject => "subscribe");
494 $req->content("Please subscribe me to the libwww-perl mailing list!\n");
496 =head2 CPAN Requests
498 URLs with scheme C<cpan:> are redirected to the a suitable CPAN
499 mirror. If you have your own local mirror of CPAN you might tell LWP
500 to use it for C<cpan:> URLs by an assignment like this:
502 $LWP::Protocol::cpan::CPAN = "file:/local/CPAN/";
504 Suitable CPAN mirrors are also picked up from the configuration for
505 the CPAN.pm, so if you have used that module a suitable mirror should
506 be picked automatically. If neither of these apply, then a redirect
507 to the generic CPAN http location is issued.
509 Example request to download the newest perl:
511 $req = HTTP::Request->new(GET => "cpan:src/latest.tar.gz");
514 =head1 OVERVIEW OF CLASSES AND PACKAGES
516 This table should give you a quick overview of the classes provided by the
517 library. Indentation shows class inheritance.
519 LWP::MemberMixin -- Access to member variables of Perl5 classes
520 LWP::UserAgent -- WWW user agent class
521 LWP::RobotUA -- When developing a robot applications
522 LWP::Protocol -- Interface to various protocol schemes
523 LWP::Protocol::http -- http:// access
524 LWP::Protocol::file -- file:// access
525 LWP::Protocol::ftp -- ftp:// access
528 LWP::Authen::Basic -- Handle 401 and 407 responses
529 LWP::Authen::Digest
531 HTTP::Headers -- MIME/RFC822 style header (used by HTTP::Message)
532 HTTP::Message -- HTTP style message
533 HTTP::Request -- HTTP request
534 HTTP::Response -- HTTP response
535 HTTP::Daemon -- A HTTP server class
537 WWW::RobotRules -- Parse robots.txt files
538 WWW::RobotRules::AnyDBM_File -- Persistent RobotRules
540 Net::HTTP -- Low level HTTP client
542 The following modules provide various functions and definitions.
544 LWP -- This file. Library version number and documentation.
545 LWP::MediaTypes -- MIME types configuration (text/html etc.)
546 LWP::Debug -- Debug logging module
547 LWP::Simple -- Simplified procedural interface for common functions
548 HTTP::Status -- HTTP status code (200 OK etc)
549 HTTP::Date -- Date parsing module for HTTP date formats
550 HTTP::Negotiate -- HTTP content negotiation calculation
551 File::Listing -- Parse directory listings
552 HTML::Form -- Processing for <form>s in HTML documents
555 =head1 MORE DOCUMENTATION
557 All modules contain detailed information on the interfaces they
558 provide. The I<lwpcook> manpage is the libwww-perl cookbook that contain
559 examples of typical usage of the library. You might want to take a
560 look at how the scripts C<lwp-request>, C<lwp-rget> and C<lwp-mirror>
561 are implemented.
563 =head1 ENVIRONMENT
565 The following environment variables are used by LWP:
567 =over
569 =item HOME
571 The C<LWP::MediaTypes> functions will look for the F<.media.types> and
572 F<.mime.types> files relative to you home directory.
574 =item http_proxy
576 =item ftp_proxy
578 =item xxx_proxy
580 =item no_proxy
582 These environment variables can be set to enable communication through
583 a proxy server. See the description of the C<env_proxy> method in
584 L<LWP::UserAgent>.
586 =item PERL_LWP_USE_HTTP_10
588 Enable the old HTTP/1.0 protocol driver instead of the new HTTP/1.1
589 driver. You might want to set this to a TRUE value if you discover
590 that your old LWP applications fails after you installed LWP-5.60 or
591 better.
593 =item PERL_HTTP_URI_CLASS
595 Used to decide what URI objects to instantiate. The default is C<URI>.
596 You might want to set it to C<URI::URL> for compatibility with old times.
598 =back
600 =head1 AUTHORS
602 LWP was made possible by contributions from Adam Newby, Albert
603 Dvornik, Alexandre Duret-Lutz, Andreas Gustafsson, Andreas König,
604 Andrew Pimlott, Andy Lester, Ben Coleman, Benjamin Low, Ben Low, Ben
605 Tilly, Blair Zajac, Bob Dalgleish, BooK, Brad Hughes, Brian
606 J. Murrell, Brian McCauley, Charles C. Fu, Charles Lane, Chris Nandor,
607 Christian Gilmore, Chris W. Unger, Craig Macdonald, Dale Couch, Dan
608 Kubb, Dave Dunkin, Dave W. Smith, David Coppit, David Dick, David
609 D. Kilzer, Doug MacEachern, Edward Avis, erik, Gary Shea, Gisle Aas,
610 Graham Barr, Gurusamy Sarathy, Hans de Graaff, Harald Joerg, Harry
611 Bochner, Hugo, Ilya Zakharevich, INOUE Yoshinari, Ivan Panchenko, Jack
612 Shirazi, James Tillman, Jan Dubois, Jared Rhine, Jim Stern, Joao
613 Lopes, John Klar, Johnny Lee, Josh Kronengold, Josh Rai, Joshua
614 Chamas, Joshua Hoblitt, Kartik Subbarao, Keiichiro Nagano, Ken
615 Williams, KONISHI Katsuhiro, Lee T Lindley, Liam Quinn, Marc Hedlund,
616 Marc Langheinrich, Mark D. Anderson, Marko Asplund, Mark Stosberg,
617 Markus B Krüger, Markus Laker, Martijn Koster, Martin Thurn, Matthew
618 Eldridge, Matthew.van.Eerde, Matt Sergeant, Michael A. Chase, Michael
619 Quaranta, Michael Thompson, Mike Schilli, Moshe Kaminsky, Nathan
620 Torkington, Nicolai Langfeldt, Norton Allen, Olly Betts, Paul
621 J. Schinder, peterm, Philip GuentherDaniel Buenzli, Pon Hwa Lin,
622 Radoslaw Zielinski, Radu Greab, Randal L. Schwartz, Richard Chen,
623 Robin Barker, Roy Fielding, Sander van Zoest, Sean M. Burke,
624 shildreth, Slaven Rezic, Steve A Fink, Steve Hay, Steven Butler,
625 Steve_Kilbane, Takanori Ugai, Thomas Lotterer, Tim Bunce, Tom Hughes,
626 Tony Finch, Ville Skyttä, Ward Vandewege, William York, Yale Huang,
627 and Yitzchak Scott-Thoennes.
629 LWP owes a lot in motivation, design, and code, to the libwww-perl
630 library for Perl4 by Roy Fielding, which included work from Alberto
631 Accomazzi, James Casey, Brooks Cutter, Martijn Koster, Oscar
632 Nierstrasz, Mel Melchner, Gertjan van Oosten, Jared Rhine, Jack
633 Shirazi, Gene Spafford, Marc VanHeyningen, Steven E. Brenner, Marion
634 Hakanson, Waldemar Kebsch, Tony Sanders, and Larry Wall; see the
635 libwww-perl-0.40 library for details.
637 =head1 COPYRIGHT
639 Copyright 1995-2008, Gisle Aas
640 Copyright 1995, Martijn Koster
642 This library is free software; you can redistribute it and/or
643 modify it under the same terms as Perl itself.
645 =head1 AVAILABILITY
647 The latest version of this library is likely to be available from CPAN
648 as well as:
650 http://gitorious.org/projects/libwww-perl
652 The best place to discuss this code is on the <libwww@perl.org>
653 mailing list.
655 =cut