A Tcl script can use a network socket just like an open file or pipeline. Instead of using the Tcl open command, you use the socket command to open a socket. Then you use gets, puts, and read to transfer data. The close command closes a network socket.
Network programming distinguishes between clients and servers. A server is a process or program that runs for long periods of time and controls access to some resource. For example, an FTP server governs access to files, and an HTTP server provides access to hypertext pages on the World Wide Web. A client typically connects to the server for a limited time in order to gain access to the resource. For example, when a Web browser fetches a hypertext page, it is acting as a client. The extended examples in this chapter show how to program the client side of the HTTP protocol.
set s [socket www.sun.com 80]There are two forms for host names. The previous example uses a domain name: www.sun.com. You can also specify raw IP addresses, which are specified with 4 dot-separated integers (e.g., 128.15.115.32). A domain name is mapped into a raw IP address by the system software, and it is almost always a better idea to use a domain name in case the IP address assignment for the host changes. This can happen when hosts are upgraded or they move to a different part of the network. As of Tcl 8.0, there is no direct access from Tcl to the DNS service that maps host names to IP addresses. The Scotty Tcl extension provides DNS access and other network protocols. Its home page is:
http://wwwsnmp.cs.utwente.nl/~schoenw/scotty/Some systems also provide symbolic names for well-known port numbers, too. For example, instead of using 20 for the FTP service, you can use ftp. On UNIX systems the well-known port numbers are listed in the file named /etc/services.
socket ?-async? ?-myaddr address? ?-myport myport? host portOrdinarily the address and port on the client side are chosen automatically. If your computer has multiple network interfaces you can select one with the -myaddr option. The address value can be a domain name or an IP address. If your application needs a specific client port, it can choose one with the -myport option. If the port is in use, the socket command will raise an error.
In some cases it can take a long time to open the connection to the server. The -async option causes connection to happen in the background, and the socket command returns immediately. The socket becomes writable when the connection completes, or fails. You can use fileevent to get a callback when this occurs. If you use the socket before the connection completes, and the socket is in blocking mode, then Tcl automatically blocks and waits for the connection to complete. If the socket is in non-blocking mode, attempts to use the socket return immediately. The gets and read commands would return -1, and fblocked would return 1 in this situation. The following example illustrates -async. One advantage of this approach is that the Tcl event loop is active while your application waits for the connection:
set sock [socket -async host port]
fileevent $sock w {set connected 1}
global connected
vwait connected
set mainSocket [socket -server Accept 2540]
proc Accept {newSock addr port} {
puts "Accepted $newSock from $addr port $port"
}
vwait foreverThis example creates a server socket and specifies the Accept command as the server callback. In this simple example, Accept just prints out its arguments. The last argument to the socket command is the server's port number. For your own unofficial servers, you'll need to pick port numbers higher than 1024 to avoid conflicts with existing services. UNIX systems prevent user programs from opening server sockets with port numbers less than 1024.
The vwait command puts Tcl into its event loop so it can do the background processing necessary to accept connections. The vwait command will wait until the forever variable is modified, which won't happen in this simple example. The key point is that Tcl processes other events (e.g., network connections and other file I/O) while it waits. If you have a Tk application (e.g., wish), then it already has an event loop to handle window system events, so you do not need to use vwait. The Tcl event loop is discussed on page 177
Server Socket Options
By default, Tcl lets the operating system choose the network interface used for the server socket, and you just supply the port number. If your computer has multiple interfaces you may want to specify a particular one. Use the -myaddr option for this. The general form of the command to open server sockets is:
socket -server callback ?-myaddr address? port
proc Echo_Server {port} {
global echo
set echo(main) [socket -server EchoAccept $port]
}
proc EchoAccept {sock addr port} {
global echo
puts "Accept $sock from $addr port $port"
set echo(addr,$sock) [list $addr $port]
fconfigure $sock -buffering line
fileevent $sock readable [list Echo $sock]
}
proc Echo {sock} {
global echo
if {[eof $sock] || [catch {gets $sock line}]} {
# end of file or abnormal connection drop
close $sock
puts "Close $echo(addr,$sock)"
unset echo(addr,$sock)
} else {
if {[string compare $line "quit"] == 0} {
# Prevent new connections.
# Existing connections stay open.
close $echo(main)
}
puts $sock $line
}
}The Echo_Server procedure opens the socket and saves the result in echo(main). When this socket is closed later, the server stops accepting new connections but existing connections won't be affected. If you want to experiment with this server, start it and wait for connections like this:
Echo_Server 2540
vwait foreverThe EchoAccept procedure uses the fconfigure command to set up line buffering. This means that each puts by the server results in a network transmission to the client. The importance of this will be described in more detail later. A complete description of the fconfigure command is given on page 181. The EchoAccept procedure uses the fileevent command to register a procedure that handles I/O on the socket. In this example, the Echo procedure will be called whenever the socket is readable. Note that it is not necessary to put the socket into non-blocking mode when using the fileevent callback. The effects of non-blocking mode are discussed on page 181.
EchoAccept saves information about each client in the echo array. This is just used to print out a message when a client closes its connection. In a more sophisticated server, however, you may need to keep more interesting state about each client. The name of the socket provides a convenient handle on the client. In this case it is used as part of the array index.
if {[eof $sock] || [catch {gets $sock line}]} {Closing the socket automatically clears the fileevent registration. If you forget to close the socket upon the end of file condition, the Tcl event loop will invoke your callback repeatedly. It is important to close it when you detect end of file.
In the normal case the server simply reads a line with gets and then writes it back to the client with puts. If the line is "quit," then the server closes its main socket. This prevents any more connections by new clients, but it doesn't affect any clients that are already connected.
proc Echo_Client {host port} {
set s [socket $host $port]
fconfigure $s -buffering line
return $s
}
set s [Echo_Client localhost 2540]
puts $s "Hello!"
gets $s=> Hello!
Example 16-3 shows a sample client of the Echo service. The main point is to ensure the socket is line buffered so that each puts by the client results in a network transmission. (Or, more precisely, each newline character results in a network transmission.) If you forget to set line buffering with fconfigure, the client's gets command will probably hang because the server will not get any data; it will be stuck in buffers on the client.
Fetching a URL with HTTP
The HyperText Transport Protocol (HTTP) is the protocol used on the World Wide Web. This section presents a procedure to fetch pages or images from a server on the Web. Items in the Web are identified with a Universal Resource Location (URL) that specifies a host, port, and location on the host. The basic outline of HTTP is that a client sends a URL to a server, and the server responds with some header information and some content data. The header information describes the content, which can be hypertext, images, postscript, and more.
proc Http_Open {url} {
global http
if {![regexp -nocase {^(http://)?([^:/]+)(:([0-9])+)?(/.*)} \
$url x protocol server y port path]} {
error "bogus URL: $url"
}
if {[string length $port] == 0} {
set port 80
}
set sock [socket $server $port]
puts $sock "GET $path HTTP/1.0"
puts $sock "Host: $server"
puts $sock "User-Agent: Tcl/Tk Http_Open"
puts $sock ""
flush $sock
return $sock
}The Http_Open procedure uses regexp to pick out the server and port from the URL. This regular expression is described in detail on page 123. The leading http:// is optional, and so is the port number. If the port is left off, then the standard port 80 is used. If the regular expression matches, then a socket command opens the network connection.
key: valueThe Host identifies the server, which supports servers that implement more than one server name. The User-Agent identifies the client program, which is often a browser like Netscape Navigator or Internet Explorer. The key-value lines are terminated with a blank line. This data is flushed out of the Tcl buffering system with the flush command. The server will respond by sending the URL contents back over the socket. This is described shortly, but first we consider proxies.
# Http_Proxy sets or queries the proxy
proc Http_Proxy {{new {}}} {
global http
if ![info exists http(proxy)] {
return {}
}
if {[string length $new] == 0} {
return $http(proxy):$http(proxyPort)
} else {
regexp {^([^:]+):([0-9]+)$} $new x \
http(proxy) http(proxyPort)
}
}
proc Http_Open {url {command GET} {query {}}} {
global http
if {![regexp -nocase {^(http://)?([^:/]+)(:([0-9])+)?(/.*)} \
$url x protocol server y port path]} {
error "bogus URL: $url"
}
if {[string length $port] == 0} {
set port 80
}
if {[info exists http(proxy)] &&
[string length $http(proxy)]} {
set sock [socket $http(proxy) $http(proxyPort)]
puts $sock "$command http://$server:$port$path HTTP/ 1.0"
} else {
set sock [socket $server $port]
puts $sock "$command $path HTTP/1.0"
}
puts $sock "User-Agent: Tcl/Tk Http_Open"
puts $sock "Host: $server"
if {[string length $query] > 0} {
puts $sock "Content-Length: [string length $query]"
puts $sock ""
puts $sock $query
}
puts $sock ""
flush $sock
fconfigure $sock -blocking 0
return $sock
}
proc Http_Head {url} {
upvar #0 $url state
catch {unset state}
set state(sock) [Http_Open $url HEAD]
fileevent $state(sock) readable [list HttpHeader $url]
# Specify the real name, not the upvar alias, to vwait
vwait $url\(status)
catch {close $state(sock)}
return $state(status)
}
proc HttpHeader {url} {
upvar #0 $url state
if [eof $state(sock)] {
set state(status) eof
close $state(sock)
return
}
if [catch {gets $state(sock) line} nbytes] {
set state(status) error
lappend state(headers) [list error $nbytes]
close $state(sock)
return
}
if {$nbytes < 0} {
# Read would block
return
} elseif {$nbytes == 0} {
# Header complete
set state(status) head
} elseif {![info exists state(headers)]} {
# Initial status reply from the server
set state(headers) [list http $line]
} else {
# Process key-value pairs
regexp {^([^:]+): *(.*)$} $line x key value
lappend state(headers) [string tolower $key] $value
}
}The Http_Head procedure uses Http_Open to contact the server. The HttpHeader procedure is registered as a fileevent handler to read the server's reply. A global array keeps state about each operation. The URL is used in the array name, and upvar is used to create an alias to the name (upvar is described on page 80):
upvar #0 $url stateYou cannot use the upvar alias as the variable specified to vwait. Instead, you must use the actual name. The backslash turns off the array reference in order to pass the name of the array element to vwait, otherwise Tcl tries to reference url as an array:
vwait $url\(status)The HttpHeader procedure checks for special cases: end of file, an error on the gets, or a short read on a non-blocking socket. The very first reply line contains a status code from the server that is in a different format than the rest of the header lines:
code messageThe code is a 3-digit numeric code. 200 is OK. Codes in the 400's and 500's indicate an error. The codes are explained fully in RFC 1945 that specifies HTTP 1.0. The first line is saved with the key http:
set state(headers) [list http $line]The rest of the header lines are parsed into key-value pairs and appended onto state(headers). This format can be used to initialize an array:
array set header $state(headers)When HttpHeader gets an empty line, the header is complete and it sets the state(status) variable, which signals Http_Head. Finally, Http_Head returns the status to its caller. The complete information about the request is still in the global array named by the URL. Example 16-7 illustrates the use of Http_Head:
set url http://www.sun.com/
set status [Http_Head $url]
=> eof
upvar #0 $url state
array set info $state(headers)
parray info
info(http) HTTP/1.0 200 OK
info(server) Apache/1.1.1
info(last-modified) Nov ...info(content-type) text/html
proc Http_Get {url {query {}}} {
upvar #0 $url state ;# Alias to global array
catch {unset state} ;# Aliases still valid.
if {[string length $query] > 0} {
set state(sock) [Http_Open $url POST $query]
} else {
set state(sock) [Http_Open $url GET]
}
set sock $state(sock)
fileevent $sock readable [list HttpHeader $url]
# Specify the real name, not the upvar alias, to vwait
vwait $url\(status)
set header(content-type) {}
set header(http) "500 unknown error"
array set header $state(headers)
# Check return status.
# 200 is OK, other codes indicate a problem.
regsub "HTTP/1.. " $header(http) {} header(http)
if {![string match 2* $header(http)]} {
catch {close $sock}
if {[info exists header(location)] &&
[string match 3* $header(http)]} {
# 3xx is a redirection to another URL
set state(link) $header(location)
return [Http_Get $header(location) $query]
}
return -code error $header(http)
}
# Set up to read the content data
switch -glob -- $header(content-type) {
text/* {
# Read HTML into memory
fileevent $sock readable [list HttpGetText $url]
}
default {
# Copy content data to a file
fconfigure $sock -translation binary
set state(filename) [File_TempName http]
if [catch {open $state(filename) w} out] {
set state(status) error
set state(error) $out
close $sock
return $header(content-type)
}
set state(fd) $out
fileevent $sock readable [list HttpCopyData $url]
}
}
vwait $url\(status)
return $header(content-type)
}Http_Get uses Http_Open to initiate the request, and then it looks for errors. It handles redirection errors that occur if a URL has changed. These have error codes that begin with 3. A common case of this is when a user omits the trailing slash on a URL (e.g., http://www.sun.com). Most servers respond with:
302 Document has moved
Location: http://www.sun.com/If the content-type is text, then Http_Get sets up a fileevent handler to read this data into memory. The socket is in non-blocking mode so the read handler can read as much data as possible each time it is called. This is more efficient than using gets to read a line at a time. The text will be stored in the state(body) variable for use by the caller of Http_Get. Example 16-9 shows the HttpGetText fileevent handler:
proc HttpGetText {url} {
upvar #0 $url state
if [eof $state(sock)] {
# Content complete
set state(status) done
close $state(sock)
} elseif {[catch {read $state(sock)} block]} {
set state(status) error
lappend state(headers) [list error $block]
close $state(sock)
} else {
append state(body) $block
}
}The content may be in binary format. This poses a problem for Tcl 7.6 and earlier. A null character will terminate the value, so values with embedded nulls cannot be processed safely by Tcl scripts. Tcl 8.0 supports strings and variable values with arbitrary binary data. Example 16-10 shows HttpCopyData that is used by Http_Get to copy non-text content data to a file. HttpCopyData uses an undocumented Tcl command, unsupported0, to copy data from one I/O channel to another without storing it in Tcl variables. This command has been replaced with fcopy in Tcl 8.0.
rename unsupported0 copychannel
proc HttpCopyData {url} {
upvar #0 $url state
if [eof $state(sock)] {
# Content complete
set state(status) done
close $state(sock)
close $state(fd)
} elseif {[catch {copychannel $state(sock) $state(fd)} x]} {
set state(status) error
lappend state(headers) [list error $x]
close $state(sock)
close $state(fd)
}
}The user of Http_Get uses the information in the state array to determine the status of the fetch and where to find the content. There are four cases to deal with:
upvar #0 $state(link) state
unsupported0 input output ?chunksize?The command reads from the input channel and writes to the output channel. The number of bytes transferred is returned. If chunksize is specified, then at most this many bytes are read from input. If input is in blocking mode, then unsupported0 will block until chunksize bytes are read, or until end of file. If input is non-blocking, all available data from input is read, up to chunksize bytes, and copied to output. If output is non-blocking, then unsupported0 queues all the data read from input and returns. Otherwise, unsupported0 could block when writing to output.
fcopy input output ?-size size? ?-command callback?The -command argument makes fcopy work in the background. When the copy is complete or an error occurs, the callback is invoked with one or two additional arguments: the number of bytes copied, and, in the case of an error, it is also passed an error string:
proc CopyDone {in out bytes {error {}} {
close $in ; close $out}
With a background copy, the fcopy command transfers data from input until end of file or size bytes have been transferred. If no -size argument is given, then the copy goes until end of file. It is not safe to do other I/O operations with input or output during a background fcopy. If either input or output get closed while the copy is in progress, the current copy is stopped. If the input is closed, then all data already queued for output is written out.
Without a -command argument, the fcopy command is much like the unsupported0 command. It reads as much as possible depending on the blocking mode of input and the optional size parameter. Everything it reads is queued for output before fcopy returns. If output is blocking, then fcopy returns after the data is written out. If input is blocking, then fcopy can block attempting to read size bytes or until end of file.
http_config=> -accept */* -proxyfilter httpProxyRequired -proxyhost {} -proxyport {} -timeout unlimited
If you specify just one option, its value is returned:
http_config -proxyfilter=> httpProxyRequired
You can set one or more options:
http_config -proxyhost webcache.eng -proxyport 8080The default proxy filter just returns the -proxyhost and -proxyport values if they are set. You can supply a smarter filter that picks a proxy based on the host in the URL. The proxy filter is called with the hostname and should return a list of two elements, the proxy host and port. If no proxy is required, return an empty list.
The -timeout value limits the time the transaction can take. Its value is unlimited for no timeout, or a seconds value. You can specify 0.5, for example, to have a 500 millisecond timeout.
For simple applications you can just block on the transaction:
set token [http_get www.sun.com/index.html]=> http#1
The leading http:// in the URL is optional. The return value is a token that is also the name of a global array that contains state about the transaction. Names like http#1 are used instead of using the URL as the array name. You can use upvar to convert the return value from http_get to an array variable:
upvar #0 $token stateBy default, the URL data is saved in state(body). The elements of the state array are described in Table 16-2:
A handful of access functions are provided so you can avoid using the state array directly. These are listed in Table 16-3:
You can take advantage of the asynchronous interface by specifying a command that is called when the transaction completes. The callback is passed the token returned from http_get so it can access the transaction state:
proc Url_Display {text url token} {
upvar #0 $token state
# Display the url in text}
You can have http_get copy the URL to a file or socket with the -channel option. This is useful for downloading large files or images. In this case you can get a progress callback so you can provide user feedback during the transaction. Example 16-11 shows a simple downloading script:
#!/usr/local/tclsh8.0
if {$argc < 2} {
puts stderr "Usage: $argv0 url file"
exit 1
}
set url [lindex $argv 0]
set file [lindex $argv 1]
set out [open $file w]
proc progress {token total current} {
puts -nonewline "."
}
http_config -proxyhost webcache.eng -proxyport 8080
set token [http_get $url -progress progress \
-headers {Pragma no-cache} -channel $out]
close $out
# Print out the return header information
puts ""
upvar #0 $token state
puts $state(http)
foreach {key value} $state(meta) {
puts "$key: $value"
}
exit 0
http_formatQuery name "Brent Welch" title "Tcl Programmer"=> name=Brent+Welch&title=Tcl+Programmer
http_reset $tokenThis is done automatically when you setup a -timeout with http_config.
http::geturl
http::config
http::formatQuery
http::reset
http::data
http::status
http::error
http::code
http::waitThe state arrays are inside the ::http namespace, too, but you can use the same upvar #0 trick to make an alias to the array.
In either case you must use package require before using any of the procedures in the package.
package require http 1.0 ;# http_get
package require http 2.0 ;# http::geturl