API documentation for the mechanize Browser
object.You can create a mechanize Browser
instance as:
Ruby Reference is intended to be most full, actual and accessible language reference. Most of the reference content is taken directly from Ruby documentation and reorganized for easier reading. The core docs were augmented with some quotes from the Ruby website, and some missing content that is written specifically for the book. Live HTTP Headers Replay (Popularity: ): A Perl script that, given the output of the Firefox extension Live HTTP Headers, will replay the script using Test::WWW::Mechanize. WWW File Share (Popularity: ): WWW File Share is a software that can help you share files with your friends. What you need to do is to specify the path which contains files you want to share (for example: 'd:' or 'e:mp3. Aug 13, 2010.
Contents
- Browser API
mechanize.
Browser
(history=None, request_class=None, content_parser=None, factory_class=<class mechanize._html.Factory>, allow_xhtml=False)[source]¶Browser-like class with support for history, forms and links.
BrowserStateError
is raised whenever the browser is in the wrongstate to complete the requested operation - e.g., when back()
iscalled when the browser history is empty, or when follow_link()
iscalled when the current response does not contain HTML data.
Public attributes:
request: current request (mechanize.Request
)
form: currently selected form (see select_form()
)
Parameters: |
|
---|
add_client_certificate
(url, key_file, cert_file)¶Add an SSL client certificate, for HTTPS client auth.
key_file and cert_file must be filenames of the key and certificatefiles, in PEM format. You can use e.g. OpenSSL to convert a p12 (PKCS12) file to PEM format:
openssl pkcs12 -clcerts -nokeys -in cert.p12 -out cert.pemopenssl pkcs12 -nocerts -in cert.p12 -out key.pem
Note that client certificate password input is very inflexible ATM. Atthe moment this seems to be console only, which is presumably thedefault behaviour of libopenssl. In future mechanize may supportthird-party libraries that (I assume) allow more options here.
back
(n=1)[source]¶Go back n steps in history, and return response object.
n: go back this number of steps (default 1 step)
click
(*args, **kwds)[source]¶See mechanize.HTMLForm.click()
for documentation.
click_link
(link=None, **kwds)[source]¶Find a link and return a Request object for it.
Arguments are as for find_link()
, except that a link may besupplied as the first argument.
cookiejar
¶Return the current cookiejar (mechanize.CookieJar
) or None
find_link
(text=None, text_regex=None, name=None, name_regex=None, url=None, url_regex=None, tag=None, predicate=None, nr=0)[source]¶Find a link in current page.
Links are returned as mechanize.Link
objects. Examples:
Links include anchors <a>, image maps <area>, and frames<iframe>.
All arguments must be passed by keyword, not position. Zero or morearguments may be supplied. In order to find a link, all argumentssupplied must match.
If a matching link is not found, mechanize.LinkNotFoundError
is raised.
Parameters: |
|
---|
follow_link
(link=None, **kwds)[source]¶Find a link and open()
it.
Arguments are as for click_link()
.
Return value is same as for open()
.
forms
()[source]¶Return iterable over forms.
The returned form objects implement the mechanize.HTMLForm
interface.
geturl
()[source]¶Get URL of current document.
global_form
()[source]¶Return the global form object, or None if the factory implementationdid not supply one.
The “global” form object contains all controls that are not descendantsof any FORM element.
The returned form object implements the mechanize.HTMLForm
interface.
This is a separate method since the global form is not regarded as partof the sequence of forms in the document – mostly forbackwards-compatibility.
links
(**kwds)[source]¶Return iterable over links (mechanize.Link
objects).
open
(url_or_request, data=None, timeout=<object object>)[source]¶Open a URL. Loads the page so that you can subsequently useforms()
, links()
, etc. on it.
Parameters: |
|
---|---|
Returns: | A |
open_novisit
(url_or_request, data=None, timeout=<object object>)[source]¶Open a URL without visiting it.
Browser state (including request, response, history, forms and links)is left unchanged by calling this function.
The interface is the same as for open()
.
This is useful for things like fetching images.
See also retrieve()
reload
()[source]¶Reload current document, and return response object.
response
()[source]¶Return a copy of the current response.
The returned object has the same interface as the object returned byopen()
retrieve
(fullurl, filename=None, reporthook=None, data=None, timeout=<object object>, open=<built-in function open>)¶Returns (filename, headers).
For remote objects, the default filename will refer to a temporaryfile. Temporary files are removed when the OpenerDirector.close()method is called.
For file: URLs, at present the returned filename is None. This maychange in future.
If the actual number of bytes read is less than indicated by theContent-Length header, raises ContentTooShortError (a URLErrorsubclass). The exception’s .result attribute contains the (filename,headers) that would have been returned.
select_form
(name=None, predicate=None, nr=None, **attrs)[source]¶Select an HTML form for input.
This is a bit like giving a form the “input focus” in a browser.
If a form is selected, the Browser object supports the HTMLForminterface, so you can call methods like set_value()
,set()
, and click()
.
Another way to select a form is to assign to the .form attribute. Theform assigned should be one of the objects returned by theforms()
method.
If no matching form is found,mechanize.FormNotFoundError
is raised.
If name is specified, then the form must have the indicated name.
If predicate is specified, then the form must match that function.The predicate function is passed the mechanize.HTMLForm
as itssingle argument, and should return a boolean value indicating whetherthe form matched.
nr, if supplied, is the sequence number of the form (where 0 is thefirst). Note that control 0 is the first form matching all the otherarguments (if supplied); it is not necessarily the first control in theform. The “global form” (consisting of all form controls not containedin any FORM element) is considered not to be part of this sequence andto have no name, so will not be matched unless both name and nr areNone.
You can also match on any HTML attribute of the <form> tag by passingin the attribute name and value as keyword arguments. To convert HTMLattributes into syntactically valid python keyword arguments, thefollowing simple rule is used. The python keyword argument name isconverted to an HTML attribute name by: Replacing all underscores withhyphens and removing any trailing underscores. You can pass in strings,functions or regular expression objects as the values to match. Forexample:
set_ca_data
(cafile=None, capath=None, cadata=None, context=None)¶Set the SSL Context used for connecting to SSL servers.
This method accepts the same arguments as thessl.SSLContext.load_verify_locations()
method from thepython standard library. You can also pass a pre-built context via thecontext keyword argument. Note that to use this feature, you must beusing python >= 2.7.9. In addition you can directly pass ina pre-built ssl.SSLContext
as the context argument.
set_client_cert_manager
(cert_manager)¶Set a mechanize.HTTPClientCertMgr, or None.
set_cookie
(cookie_string)[source]¶Set a cookie.
Note that it is NOT necessary to call this method under ordinarycircumstances: cookie handling is normally entirely automatic. Theintended use case is rather to simulate the setting of a cookie byclient script in a web page (e.g. JavaScript). In that case, use ofthis method is necessary because mechanize currently does not supportJavaScript, VBScript, etc.
The cookie is added in the same way as if it had arrived with thecurrent response, as a result of the current request. This means that,for example, if it is not appropriate to set the cookie based on thecurrent request, no cookie will be set.
The cookie will be returned automatically with subsequent responsesmade by the Browser instance whenever that’s appropriate.
cookie_string should be a valid value of the Set-Cookie header.
For example:
Currently, this method does not allow for adding RFC 2986 cookies.This limitation will be lifted if anybody requests it.
See also set_simple_cookie()
for an easier way to set cookieswithout needing to create a Set-Cookie header string.
set_cookiejar
(cookiejar)¶Set a mechanize.CookieJar, or None.
set_debug_http
(handle)¶Print HTTP headers to sys.stdout.
set_debug_redirects
(handle)¶Log information about HTTP redirects (including refreshes).
Logging is performed using module logging. The logger name is“mechanize.http_redirects”. To actually print some debug output,eg:
Other logger names relevant to this module:
- mechanize.http_responses
- mechanize.cookies
To turn on everything:
set_debug_responses
(handle)¶Log HTTP response bodies.
See set_debug_redirects()
for details of logging.
Response objects may be .seek()able if this is set (currently returnedresponses are, raised HTTPError exception responses are not).
set_handle_equiv
(handle, head_parser_class=None)¶Set whether to treat HTML http-equiv headers like HTTP headers.
Response objects may be .seek()able if this is set (currently returnedresponses are, raised HTTPError exception responses are not).
set_handle_gzip
(handle)¶Add header indicating to server that we handle gzipcontent encoding. Note that if the server sends gzip’ed content,it is handled automatically in any case, regardless of this setting.
set_handle_redirect
(handle)¶Set whether to handle HTTP 30x redirections.
set_handle_referer
(handle)[source]¶Set whether to add Referer header to each request.
set_handle_refresh
(handle, max_time=None, honor_time=True)¶Set whether to handle HTTP Refresh headers.
set_handle_robots
(handle)¶Set whether to observe rules from robots.txt.
set_handled_schemes
(schemes)¶Set sequence of URL scheme (protocol) strings.
For example: ua.set_handled_schemes([“http”, “ftp”])
If this fails (with ValueError) because you’ve passed an unknownscheme, the set of handled schemes will not be changed.
set_header
(header, value=None)[source]¶Convenience method to set a header value in self.addheadersso that the header is sent out with all requests automatically.
Parameters: |
|
---|
set_html
(html, url='http://example.com/')[source]¶Set the response to dummy with given HTML, and URL if given.
Allows you to then parse that HTML, especially to extract formsinformation. If no URL was given then the default is “example.com”.
set_password_manager
(password_manager)¶Set a mechanize.HTTPPasswordMgrWithDefaultRealm, or None.
set_proxies
(proxies=None, proxy_bypass=None)¶Configure proxy settings.
Parameters: |
|
---|
The default is to try to obtain proxy settings from the system (see thedocumentation for urllib.urlopen for information about thesystem-specific methods used – note that’s urllib, not urllib2).
To avoid all use of proxies, pass an empty proxies dict.
set_proxy_password_manager
(password_manager)¶Set a mechanize.HTTPProxyPasswordMgr, or None.
set_request_gzip
(handle)¶Add header indicating to server that we handle gzipcontent encoding. Note that if the server sends gzip’ed content,it is handled automatically in any case, regardless of this setting.
set_response
(response)[source]¶Replace current response with (a copy of) response.
response may be None.
This is intended mostly for HTML-preprocessing.
set_simple_cookie
(name, value, domain, path='/')[source]¶Similar to set_cookie()
except that instead of using acookie string, you simply specify the name, value, domainand optionally the path.The created cookie will never expire. For example:
submit
(*args, **kwds)[source]¶Submit current form.
Arguments are as for mechanize.HTMLForm.click()
.
Return value is same as for open()
.
Download Mechanize For Ruby Machine Learning
title
()[source]¶Return title, or None if there is no title element in the document.
viewing_html
()[source]¶Return whether the current response contains HTML data.
visit_response
(response, request=None)[source]¶Visit the response, as if it had been open()
ed.
Unlike set_response()
, this updates history rather thanreplacing the current response.
mechanize.
Request
(url, data=None, headers={}, origin_req_host=None, unverifiable=False, visit=None, timeout=<object object>, method=None)[source]¶A request for some network resource. Note that if you specify the method as‘GET’ and the data as a dict, then it will be automatically appended to theURL. If you leave method as None, then the method will be auto-set toPOST and the data will become part of the POST request.
Parameters: |
|
---|
The remaining arguments are for internal use.
add_data
(data)¶Set the data (a bytestring) to be sent with this request
add_header
(key, val=None)[source]¶Add the specified header, replacing existing one, if needed. If valis None, remove the header.
add_unredirected_header
(key, val)[source]¶Same as add_header()
except that this header will notbe sent for redirected requests.
get_data
()[source]¶The data to be sent with this request
get_header
(header_name, default=None)[source]¶Get the value of the specified header. If absent, return default
get_method
()[source]¶The method used for HTTP requests
has_data
()[source]¶True iff there is some data to be sent with this request
has_header
(header_name)[source]¶Check if the specified header is present
has_proxy
()[source]¶Private method.
header_items
()[source]¶Get a copy of all headers for this request as a list of 2-tuples
Download Mechanize For Ruby Mac Download
set_data
(data)[source]¶Set the data (a bytestring) to be sent with this request
Response objects in mechanize are seek() able file
-like objects that supportsome additional methods, depending on the protocol used for the connection. The documentationbelow is for HTTP(s) responses, as these are the most common.
Additional methods present for HTTP responses:
mechanize._mechanize.
HTTPResponse
¶code
¶The HTTP status code
getcode
()¶Return HTTP status code
geturl
()¶Return the URL of the resource retrieved, commonly used to determine ifa redirect was followed
get_all_header_names
(normalize=True)¶Return a list of all headers names. When normalize is True, thecase of the header names is normalized.
get_all_header_values
(name, normalize=True)¶Return a list of all values for the specified header name (which iscase-insensitive. Since headers in HTTP can be specified multipletimes, the returned value is always a list. Seerfc822.Message.getheaders()
.
info
()¶Return the headers of the response as a rfc822.Message
instance.
__getitem__
(header_name)¶Return the last HTTP Header matching the specified name as string.mechanize Response object act like dictionaries for convenient accessto header values. For example: response['Date']
. You can accessheader values using the header names, case-insensitively. Note thatwhen more than one header with the same name is present, only the valueof the last header is returned, use get_all_header_values()
toget the values of all headers.
get(header_name, default=None):
Mechanize Documentation
Return the header value for the specified header_name or default ifthe header is not present. See __getitem__()
.
mechanize.
Link
(base_url, url, text, tag, attrs)[source]¶A link in a HTML document
Variables: |
|
---|
mechanize.
History
[source]¶Though this will become public, the implied interface is not yet stable.
mechanize._html.
content_parser
(data, url=None, response_info=None, transport_encoding=None, default_encoding='utf-8', is_html=True)[source]¶Download Mechanize For Ruby Mac Os
Parse data (a bytes object) into an etree representation such asxml.etree.ElementTree
or lxml.etree
Parameters: |
|
---|