~All In One~: 03/11/09

11 March, 2009

Tutorial : HTTP

Hypertext Transfer Protocol (HTTP) is an application-level protocol for distributed, collaborative, hypermedia information systems. Its use for retrieving inter-linked resources led to the establishment of the World Wide Web.

HTTP development was coordinated by the World Wide Web Consortium and the Internet Engineering Task Force (IETF), culminating in the publication of a series of Requests for Comments (RFCs), most notably RFC 2616 (June 1999), which defines HTTP/1.1, the version of HTTP in common use.

HTTP is a request/response standard between a client and a server. A client is the end-user, the server is the web site. The client making a HTTP request—using a web browser, spider, or other end-user tool—is referred to as the user agent. The responding server—which stores or creates resources such as HTML files and images—is called the origin server. In between the user agent and origin server may be several intermediaries, such as proxies, gateways, and tunnels. HTTP is not constrained to using TCP/IP and its supporting layers, although this is its most popular application on the Internet. Indeed HTTP can be "implemented on top of any other protocol on the Internet, or on other networks. HTTP only presumes a reliable transport; any protocol that provides such guarantees can be used."

Typically, an HTTP client initiates a request. It establishes a Transmission Control Protocol (TCP) connection to a particular port on a host (port 80 by default; see List of TCP and UDP port numbers). An HTTP server listening on that port waits for the client to send a request message. Upon receiving the request, the server sends back a status line, such as "HTTP/1.1 200 OK", and a message of its own, the body of which is perhaps the requested resource, an error message, or some other information.

Resources to be accessed by HTTP are identified using Uniform Resource Identifiers (URIs) (or, more specifically, Uniform Resource Locators (URLs)) using the http: or https URI schemes.
The Internet Protocol Suite
Application Layer
BGP · DHCP · DNS · FTP · GTP · HTTP · IMAP · IRC · Megaco · MGCP · NNTP · NTP · POP · RIP · RPC · RTP · RTSP · SDP · SIP · SMTP · SNMP · SOAP · SSH · STUN · Telnet · TLS/SSL · XMPP · (more)
Transport Layer
TCP · UDP · DCCP · SCTP · RSVP · ECN · (more)
Internet Layer
IP (IPv4, IPv6) · ICMP · ICMPv6 · IGMP · IPsec · (more)
Link Layer
ARP · RARP · NDP · OSPF · Tunnels (L2TP) · Media Access Control (Ethernet, MPLS, DSL, ISDN, FDDI) · Device Drivers · (more)
This box: view • talk • edit

* 1 Request message
* 2 Request methods
o 2.1 Safe methods
o 2.2 Idempotent methods and web applications
* 3 Status codes
* 4 Persistent connections
* 5 HTTP session state
* 6 Secure HTTP
o 6.1 HTTPS URI scheme
o 6.2 HTTP 1.1 Upgrade header
* 7 Sample
* 8 See also
* 9 References
* 10 External links

Request message

The request message consists of the following:

* Request line, such as GET /images/logo.gif HTTP/1.1, which requests a resource called /images/logo.gif from server
* Headers, such as Accept-Language: en
* An empty line
* An optional message body

The request line and headers must all end with (that is, a carriage return followed by a line feed). The empty line must consist of only and no other whitespace. In the HTTP/1.1 protocol, all headers except Host are optional.

A request line containing only the path name is accepted by servers to maintain compatibility with HTTP clients before the HTTP/1.0 specification.

Request methods
A HTTP request made using telnet. The request, response headers and response body are highlighted.

HTTP defines eight methods (sometimes referred to as "verbs") indicating the desired action to be performed on the identified resource. What this resource represents, whether pre-existing data or data that is generated dynamically, depends on the implementation of the server. Often, the resource corresponds to a file or the output of an executable residing on the server.

HEAD
Asks for the response identical to the one that would correspond to a GET request, but without the response body. This is useful for retrieving meta-information written in response headers, without having to transport the entire content.

GET
Requests a representation of the specified resource. Note that GET should not be used for operations that cause side-effects, such as using it for taking actions in web applications. One reason for this is that GET may be used arbitrarily by robots or crawlers, which should not need to consider the side effects that a request should cause. See safe methods below.
POST
Submits data to be processed (e.g., from an HTML form) to the identified resource. The data is included in the body of the request. This may result in the creation of a new resource or the updates of existing resources or both.
PUT
Uploads a representation of the specified resource.
DELETE
Deletes the specified resource.
TRACE
Echoes back the received request, so that a client can see what intermediate servers are adding or changing in the request.
OPTIONS
Returns the HTTP methods that the server supports for specified URL. This can be used to check the functionality of a web server by requesting '*' instead of a specific resource.
CONNECT
Converts the request connection to a transparent TCP/IP tunnel, usually to facilitate SSL-encrypted communication (HTTPS) through an unencrypted HTTP proxy

HTTP servers are required to implement at least the GET and HEAD method and, whenever possible, also the OPTIONS method.

Safe methods

Some methods (for example, HEAD, GET, OPTIONS and TRACE) are defined as safe, which means they are intended only for information retrieval and should not change the state of the server. In other words, they should not have side effects, beyond relatively harmless effects such as logging, caching, the serving of banner advertisements or incrementing a web counter. Making arbitrary GET requests without regard to the context of the application's state should therefore be considered safe.

By contrast, methods such as POST, PUT and DELETE are intended for actions which may cause side effects either on the server, or external side effects such as financial transactions or transmission of email. Such methods are therefore not usually used by conforming web robots or web crawlers, which tend to make requests without regard to context or consequences.

Despite the prescribed safety of GET requests, in practice their handling by the server is not technically limited in any way, and careless or deliberate programming can just as easily (or more easily, due to lack of user agent precautions) cause non-trivial changes on the server. This is discouraged, because it can cause problems for Web caching, search engines and other automated agents, which can make unintended changes on the server.

Idempotent methods and web applications

Methods PUT and DELETE are defined to be idempotent, meaning that multiple identical requests should have the same effect as a single request. Methods GET, HEAD, OPTIONS and TRACE, being prescribed as safe, should also be idempotent. HTTP is a stateless protocol.

By contrast, the POST method is not necessarily idempotent, and therefore sending an identical POST request multiple times may further affect state or cause further side effects (such as financial transactions). In some cases this may be desirable, but in other cases this could be due to an accident, such as when a user does not realize that their action will result in sending another request, or they did not receive adequate feedback that their first request was successful. While web browsers may show alert dialog boxes to warn users in some cases where reloading a page may re-submit a POST request, it is generally up to the web application to takes responsibility for handling cases where a POST request should not be submitted more than once.

Note that whether a method is idempotent is not enforced by the protocol or web server. It is perfectly possible to write a web application in which (for example) a database insert or other non-idempotent action is triggered by a GET or other request. Ignoring this recommendation, however, may result in undesirable consequences if a user agent assumes that repeating the same request is safe when it isn't.

Status codes
See also: List of HTTP status codes

In HTTP/1.0 and since, the first line of the HTTP response is called the status line and includes a numeric status code (such as "404") and a textual reason phrase (such as "Not Found"). The way the user agent handles the response primarily depends on the code and secondarily on the response headers. Custom status codes can be used since, if the user agent encounters a code it does not recognize, it can use the first digit of the code to determine the general class of the response.

Also, the standard reason phrases are only recommendations and can be replaced with "local equivalents" at the web developer's discretion. If the status code indicated a problem, the user agent might display the reason phrase to the user to provide further information about the nature of the problem. The standard also allows the user agent to attempt to interpret the reason phrase, though this might be unwise since the standard explicitly specifies that status codes are machine-readable and reason phrases are human-readable.

Persistent connections
Main article: HTTP persistent connections

In HTTP/0.9 and 1.0, the connection is closed after a single request/response pair. In HTTP/1.1 a keep-alive-mechanism was introduced, where a connection could be reused for more than one request.

Such persistent connections reduce lag perceptibly, because the client does not need to re-negotiate the TCP connection after the first request has been sent.

Version 1.1 of the protocol made bandwidth optimization improvements to HTTP/1.0. For example, HTTP/1.1 introduced chunked transfer encoding to allow content on persistent connections to be streamed, rather than buffered. HTTP pipelining further reduces lag time, allowing clients to send multiple requests before a previous response has been received to the first one. Another improvement to the protocol was byte serving, which is when a server transmits just the portion of a resource explicitly requested by a client.

HTTP session state

HTTP is a stateless protocol. The advantage of a stateless protocol is that hosts do not need to retain information about users between requests, but this forces web developers to use alternative methods for maintaining users' states. For example, when a host needs to customize the content of a website for a user, the web application must be written to track the user's progress from page to page. A common method for solving this problem involves sending and receiving cookies. Other methods include server side sessions, hidden variables (when the current page is a form), and URL encoded parameters (such as /index.php?session_id=some_unique_session_code).

Secure HTTP

There are currently two methods of establishing a secure HTTP connection: the HTTPS URI scheme and the HTTP 1.1 Upgrade header, introduced by RFC 2817. Browser support for the Upgrade header is, however, nearly non-existent, hence the HTTPS URI scheme is still the dominant method of establishing a secure HTTP connection. Secure HTTP is notated by the prefix HTTPS:// instead of HTTP://

HTTPS URI scheme
Main article: HTTPS

HTTPS: is a URI scheme syntactically identical to the http: scheme used for normal HTTP connections, but which signals the browser to use an added encryption layer of SSL/TLS to protect the traffic. SSL is especially suited for HTTP since it can provide some protection even if only one side of the communication is authenticated. This is the case with HTTP transactions over the Internet, where typically only the server is authenticated (by the client examining the server's certificate).

HTTP 1.1 Upgrade header

HTTP 1.1 introduced support for the Upgrade header. In the exchange, the client begins by making a clear-text request, which is later upgraded to TLS. Either the client or the server may request (or demand) that the connection be upgraded. The most common usage is a clear-text request by the client followed by a server demand to upgrade the connection, which looks like this:

Client:

GET /encrypted-area HTTP/1.1
Host: www.example.com

Server:

HTTP/1.1 426 Upgrade Required
Upgrade: TLS/1.0, HTTP/1.1
Connection: Upgrade

The server returns a 426 status-code because 400 level codes indicate a client failure (see List of HTTP status codes), which correctly alerts legacy clients that the failure was client-related.

The benefits of using this method for establishing a secure connection are:

* that it removes messy and problematic redirection and URL rewriting on the server side,
* it allows virtual hosting of secured websites (although HTTPS also allows this using Server Name Indication), and
* it reduces user confusion by providing a single way to access a particular resource.

A weakness with this method is that the requirement for a secure HTTP cannot be specified in the URI. In practice, the (untrusted) server will thus be responsible for enabling secure HTTP, not the (trusted) client.

Sample

Below is a sample conversation between an HTTP client and an HTTP server running on www.example.com, port 80.

Client request (followed by a blank line, so that request ends with a double newline, each in the form of a carriage return followed by a line feed):

GET /index.html HTTP/1.1
Host: www.example.com

The "Host" header distinguishes between various DNS names sharing a single IP address, allowing name-based virtual hosting. While optional in HTTP/1.0, it is mandatory in HTTP/1.1.

Server response (followed by a blank line and text of the requested page):

HTTP/1.1 200 OK
Date: Mon, 23 May 2005 22:38:34 GMT
Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux)
Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
Etag: "3f80f-1b6-3e1cb03b"
Accept-Ranges: bytes
Content-Length: 438
Connection: close
Content-Type: text/html; charset=UTF-8

The ETag (entity tag) header is used to determine if the URI cached is identical to the requested URI on the server. Content-Type specifies the Internet media type of the data conveyed by the http message, while Content-Length indicates its length in bytes. The HTTP/1.1 webserver publishes its ability to respond to requests for certain byte ranges of the document by setting the header Accept-Ranges: bytes. This is useful if the client needs to have only certain portions of a resource sent by the server, which is called byte serving. When Connection: close is sent in a header, it means that the web server will close the TCP connection immediately after the transfer of this package.

Tutorial : WWW

"The World Wide Web," "WWW" and "Web surfing" redirects here. For the Web browser, see WorldWideWeb. For other uses, see Web and WWW (disambiguation).
WWW's historic logo designed by Robert Cailliau

The World Wide Web (commonly abbreviated as "the Web") is a very large set of interlinked hypertext documents accessed via the Internet. With a Web browser, one can view Web pages that may contain text, images, videos, and other multimedia and navigate between them using hyperlinks. Using concepts from earlier hypertext systems, the World Wide Web was begun in 1992 by the English physicist Sir Tim Berners-Lee, now the Director of the World Wide Web Consortium, and Robert Cailliau, a Belgian computer scientist, while both were working at CERN in Geneva, Switzerland. In 1990, they proposed building a "web of nodes" storing "hypertext pages" viewed by "browsers" on a network, and released that web in 1992. Connected by the existing Internet, other websites were created, around the world, adding international standards for domain names & the HTML language. Since then, Berners-Lee has played an active role in guiding the development of Web standards (such as the markup languages in which Web pages are composed), and in recent years has advocated his vision of a Semantic Web. Cailliau went on early retirement in January 2005 and left CERN in January 2007.

The World Wide Web enabled the spread of information over the Internet through an easy-to-use and flexible format. It thus played an important role in popularising use of the Internet, to the extent that the World Wide Web has become a synonym for Internet, with the two being conflated in popular use.

* 1 How it works
* 2 History
* 3 Standards
* 4 Privacy
* 5 Security
* 6 Web accessibility
* 7 Java
* 8 JavaScript
* 9 Publishing Web pages
* 10 Statistics
* 11 Speed issues
* 12 Caching
* 13 Link rot and Web archival
* 14 Academic conferences
* 15 WWW prefix in Web addresses
o 15.1 Pronunciation of "www"
* 16 See also
* 17 Notes
* 18 References
* 19 External links

How it works

The terms Internet and World Wide Web are often used in every-day speech without much distinction. However, the Internet and the World Wide Web are not one and the same. The Internet is a global data communications system. It is a hardware and software infrastructure that provides connectivity between computers. In contrast, the Web is one of the services communicated via the Internet. It is a collection of interconnected documents and other resources, linked by hyperlinks and URLs.

Viewing a Web page on the World Wide Web normally begins either by typing the URL of the page into a Web browser, or by following a hyperlink to that page or resource. The Web browser then initiates a series of communication messages, behind the scenes, in order to fetch and display it.

First, the server-name portion of the URL is resolved into an IP address using the global, distributed Internet database known as the domain name system, or DNS. This IP address is necessary to contact and send data packets to the Web server.

The browser then requests the resource by sending an HTTP request to the Web server at that particular address. In the case of a typical Web page, the HTML text of the page is requested first and parsed immediately by the Web browser, which will then make additional requests for images and any other files that form a part of the page. Statistics measuring a website's popularity are usually based on the number of 'page views' or associated server 'hits', or file requests, which take place.

Having received the required files from the Web server, the browser then renders the page onto the screen as specified by its HTML, CSS, and other Web languages. Any images and other resources are incorporated to produce the on-screen Web page that the user sees.

Most Web pages will themselves contain hyperlinks to other related pages and perhaps to downloads, source documents, definitions and other Web resources. Such a collection of useful, related resources, interconnected via hypertext links, is what was dubbed a "web" of information. Making it available on the Internet created what Tim Berners-Lee first called the WorldWideWeb (in its original CamelCase, which was subsequently discarded) in November 1990.

History
Main article: History of the World Wide Web
This NeXT Computer used by Sir Tim Berners-Lee at CERN became the first Web server.

The underlying ideas of the Web can be traced as far back as 1980, when, at CERN in Switzerland, Sir Tim Berners-Lee built ENQUIRE (a reference to Enquire Within Upon Everything, a book he recalled from his youth). While it was rather different from the system in use today, it contained many of the same core ideas (and even some of the ideas of Berners-Lee's next project after the World Wide Web, the Semantic Web).

In March 1989, Berners-Lee wrote a proposal which referenced ENQUIRE and described a more elaborate information management system. With help from Robert Cailliau, he published a more formal proposal (on November 12, 1990) to build a "Hypertext project" called "WorldWideWeb" (one word, also "W3") as a "web of nodes" with "hypertext documents" to store data. That data would be viewed in "hypertext pages" (webpages) by various "browsers" (line-mode or full-screen) on the computer network, using an "access protocol" connecting the "Internet and DECnet protocol worlds".

The proposal had been modeled after EBT's (Electronic Book Technology, a spin-off from the Institute for Research in Information and Scholarship at Brown University) Dynatext SGML reader that CERN had licensed. The Dynatext system, although technically advanced (a key player in the extension of SGML ISO 8879:1986 to Hypermedia within HyTime), was considered too expensive and with an inappropriate licensing policy for general HEP (High Energy Physics) community use: a fee for each document and each time a document was changed.

A NeXT Computer was used by Berners-Lee as the world's first Web server and also to write the first Web browser, WorldWideWeb, in 1990. By Christmas 1990, Berners-Lee had built all the tools necessary for a working Web:the first Web browser (which was a Web editor as well), the first Web server, and the first Web pages which described the project itself.

On August 6, 1991, he posted a short summary of the World Wide Web project on the alt.hypertext newsgroup. This date also marked the debut of the Web as a publicly available service on the Internet.

The first server outside Europe was set up at SLAC in December 1991.

The crucial underlying concept of hypertext originated with older projects from the 1960s, such as the Hypertext Editing System (HES) at Brown University--- among others Ted Nelson and Andries van Dam--- Ted Nelson's Project Xanadu and Douglas Engelbart's oN-Line System (NLS). Both Nelson and Engelbart were in turn inspired by Vannevar Bush's microfilm-based "memex," which was described in the 1945 essay "As We May Think".

Berners-Lee's breakthrough was to marry hypertext to the Internet. In his book Weaving The Web, he explains that he had repeatedly suggested that a marriage between the two technologies was possible to members of both technical communities, but when no one took up his invitation, he finally tackled the project himself. In the process, he developed a system of globally unique identifiers for resources on the Web and elsewhere: the Uniform Resource Identifier.

The World Wide Web had a number of differences from other hypertext systems that were then available. The Web required only unidirectional links rather than bidirectional ones. This made it possible for someone to link to another resource without action by the owner of that resource. It also significantly reduced the difficulty of implementing Web servers and browsers (in comparison to earlier systems), but in turn presented the chronic problem of link rot. Unlike predecessors such as HyperCard, the World Wide Web was non-proprietary, making it possible to develop servers and clients independently and to add extensions without licensing restrictions.

On April 30, 1993, CERN announced that the World Wide Web would be free to anyone, with no fees due. Coming two months after the announcement that the Gopher protocol was no longer free to use, this produced a rapid shift away from Gopher and towards the Web. An early popular Web browser was ViolaWWW, which was based upon HyperCard.

Scholars generally agree that a turning point for the World Wide Web began with the introduction of the Mosaic Web browser in 1993, a graphical browser developed by a team at the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign (NCSA-UIUC), led by Marc Andreessen. Funding for Mosaic came from the U.S. High-Performance Computing and Communications Initiative, a funding program initiated by the High Performance Computing and Communication Act of 1991, one of several computing developments initiated by U.S. Senator Al Gore. Prior to the release of Mosaic, graphics were not commonly mixed with text in Web pages, and its popularity was less than older protocols in use over the Internet, such as Gopher and Wide Area Information Servers (WAIS). Mosaic's graphical user interface allowed the Web to become, by far, the most popular Internet protocol.

The World Wide Web Consortium (W3C) was founded by Tim Berners-Lee after he left the European Organization for Nuclear Research (CERN) in October, 1994. It was founded at the Massachusetts Institute of Technology Laboratory for Computer Science (MIT/LCS) with support from the Defense Advanced Research Projects Agency (DARPA)—which had pioneered the Internet—and the European Commission.

Standards
Main article: Web standards

Many formal standards and other technical specifications define the operation of different aspects of the World Wide Web, the Internet, and computer information exchange. Many of the documents are the work of the World Wide Web Consortium (W3C), headed by Berners-Lee, but some are produced by the Internet Engineering Task Force (IETF) and other organizations.

Usually, when Web standards are discussed, the following publications are seen as foundational:

* Recommendations for markup languages, especially HTML and XHTML, from the W3C. These define the structure and interpretation of hypertext documents.
* Recommendations for stylesheets, especially CSS, from the W3C.
* Standards for ECMAScript (usually in the form of JavaScript), from Ecma International.
* Recommendations for the Document Object Model, from W3C.

Additional publications provide definitions of other essential technologies for the World Wide Web, including, but not limited to, the following:

* Uniform Resource Identifier (URI), which is a universal system for referencing resources on the Internet, such as hypertext documents and images. URIs, often called URLs, are defined by the IETF's RFC 3986 / STD 66: Uniform Resource Identifier (URI): Generic Syntax, as well as its predecessors and numerous URI scheme-defining RFCs;
* HyperText Transfer Protocol (HTTP), especially as defined by RFC 2616: HTTP/1.1 and RFC 2617: HTTP Authentication, which specify how the browser and server authenticate each other.

Privacy

Computer users, who save time and money, and who gain conveniences and entertainment, may or may not have surrendered the right to privacy in exchange for using a number of technologies including the Web. Worldwide, more than a half billion people have used a social network service, and of Americans who grew up with the Web, half created an online profile and are part of a generational shift that could be changing norms. Among services paid for by advertising, Yahoo! could collect the most data about users of commercial websites, about 2,500 bits of information per month about each typical user of its site and its affiliated advertising network sites. Yahoo! was followed by MySpace with about half that potential and then by AOL-TimeWarner, Google, Facebook, Microsoft, and eBay.

Privacy representatives from 60 countries have resolved to ask for laws to complement industry self-regulation, for education for children and other minors who use the Web, and for default protections for users of social networks. They also believe data protection for personally identifiable information benefits business more than the sale of that information. Users can opt-in to features in browsers from companies such as Apple, Google, Microsoft (beta) and Mozilla (beta) to clear their personal histories locally and block some cookies and advertising networks but they are still tracked in websites' server logs, and particularly Web beacons.Berners-Lee and colleagues see hope in accountability and appropriate use achieved by extending the Web's architecture to policy awareness, perhaps with audit logging, reasoners and appliances.

Security

The Web has become criminals' preferred pathway for spreading malware. Cybercrime carried out on the Web can include identity theft, fraud, espionage and intelligence gathering. Web-based vulnerabilities now outnumber traditional computer security concerns, and as measured by Google, about one in ten Web pages may contain malicious code. Most Web-based attacks take place on legitimate websites, and most, as measured by Sophos, are hosted in the United States, China and Russia.

The most common of all malware threats is SQL injection attacks against websites. Through HTML and URIs the Web was vulnerable to attacks like cross-site scripting (XSS) that came with the introduction of JavaScript and were exacerbated to some degree by Web 2.0 and Ajax web design that favors the use of scripts. Today by one estimate, 70% of all websites are open to XSS attacks on their users.

Proposed solutions vary to extremes. Large security vendors like McAfee already design governance and compliance suites to meet post-9/11 regulations, and some, like Finjan have recommended active real-time inspection of code and all content regardless of its source. Some have argued that for enterprise to see security as a business opportunity rather than a cost center, "ubiquitous, always-on digital rights management" enforced in the infrastructure by a handful of organizations must replace the hundreds of companies that today secure data and networks. Jonathan Zittrain has said users sharing responsibility for computing safety is far preferable to locking down the Internet.

In terms of security as it relates to the 'physical' portion of the World Wide Web/Internet, the 'distributed' nature of the Internet provides security against attack -- as there is no one single 'focus point' through which all Internet traffic is directed, any attempt to 'cripple' the Internet would only disable a small portion of the whole, and the connecting computers would simply direct the affected traffic through other, unaffected networks and computers.

Web accessibility
Main article: Web accessibility

Many countries regulate web accessibility as a requirement for web sites.

Java

A significant advance in Web technology was Sun Microsystems' Java platform. It enables Web pages to embed small programs (called applets) directly into the view. These applets run on the end-user's computer, providing a richer user interface than simple Web pages. Java client-side applets never gained the popularity that Sun had hoped for a variety of reasons, including lack of integration with other content (applets were confined to small boxes within the rendered page) and the fact that many computers at the time were supplied to end users without a suitably installed Java Virtual Machine, and so required a download by the user before applets would appear. Adobe Flash now performs many of the functions that were originally envisioned for Java applets, including the playing of video content, animation, and some rich GUI features. Java itself has become more widely used as a platform and language for server-side and other programming.

JavaScript

JavaScript, on the other hand, is a scripting language that was initially developed for use within Web pages. The standardized version is ECMAScript. While its name is similar to Java, JavaScript was developed by Netscape and has very little to do with Java, although the syntax of both languages is derived from the C programming language. In conjunction with a Web page's Document Object Model (DOM), JavaScript has become a much more powerful technology than its creators originally envisioned.
The manipulation of a page's DOM after the page is delivered to the client has been called Dynamic HTML (DHTML), to emphasize a shift away from static HTML displays.

In simple cases, all the optional information and actions available on a JavaScript-enhanced Web page will have been downloaded when the page was first delivered. Ajax ("Asynchronous JavaScript and XML") is a group of interrelated web development techniques used for creating interactive web applications that provide a method whereby parts within a Web page may be updated, using new information obtained over the network at a later time in response to user actions. This allows the page to be more responsive, interactive and interesting, without the user having to wait for whole-page reloads. Ajax is seen as an important aspect of what is being called Web 2.0. Examples of Ajax techniques currently in use can be seen in Gmail, Google Maps, and other dynamic Web applications.

Publishing Web pages

Web page production is available to individuals outside the mass media. In order to publish a Web page, one does not have to go through a publisher or other media institution, and potential readers could be found in all corners of the globe.

Many different kinds of information are available on the Web, and for those who wish to know other societies, cultures, and peoples, it has become easier.

The increased opportunity to publish materials is observable in the countless personal and social networking pages, as well as sites by families, small shops, etc., facilitated by the emergence of free Web hosting services.

Statistics

According to a 2001 study, there were massively more than 550 billion documents on the Web, mostly in the invisible Web, or deep Web. A 2002 survey of 2,024 million Web pages determined that by far the most Web content was in English: 56.4%; next were pages in German (7.7%), French (5.6%), and Japanese (4.9%). A more recent study, which used Web searches in 75 different languages to sample the Web, determined that there were over 11.5 billion Web pages in the publicly indexable Web as of the end of January 2005. As of June 2008, the indexable web contains at least 63 billion pages. On July 25, 2008, Google software engineers Jesse Alpert and Nissan Hajaj announced that Google Search had discovered one trillion unique URLs.

Over 100.1 million websites operated as of March 2008. Of these 74% were commercial or other sites operating in the .com generic top-level domain.

Speed issues

Frustration over congestion issues in the Internet infrastructure and the high latency that results in slow browsing has led to an alternative, pejorative name for the World Wide Web: the World Wide Wait. Speeding up the Internet is an ongoing discussion over the use of peering and QoS technologies. Other solutions to reduce the World Wide Wait can be found on W3C.

Standard guidelines for ideal Web response times are:

* 0.1 second (one tenth of a second). Ideal response time. The user doesn't sense any interruption.
* 1 second. Highest acceptable response time. Download times above 1 second interrupt the user experience.
* 10 seconds. Unacceptable response time. The user experience is interrupted and the user is likely to leave the site or system.

These numbers are useful for planning server capacity.

Caching

If a user revisits a Web page after only a short interval, the page data may not need to be re-obtained from the source Web server. Almost all Web browsers cache recently-obtained data, usually on the local hard drive. HTTP requests sent by a browser will usually only ask for data that has changed since the last download. If the locally-cached data are still current, it will be reused.

Caching helps reduce the amount of Web traffic on the Internet. The decision about expiration is made independently for each downloaded file, whether image, stylesheet, JavaScript, HTML, or whatever other content the site may provide. Thus even on sites with highly dynamic content, many of the basic resources only need to be refreshed occasionally. Web site designers find it worthwhile to collate resources such as CSS data and JavaScript into a few site-wide files so that they can be cached efficiently. This helps reduce page download times and lowers demands on the Web server.

There are other components of the Internet that can cache Web content. Corporate and academic firewalls often cache Web resources requested by one user for the benefit of all. (See also Caching proxy server.) Some search engines, such as Google or Yahoo!, also store cached content from websites.

Apart from the facilities built into Web servers that can determine when files have been updated and so need to be re-sent, designers of dynamically-generated Web pages can control the HTTP headers sent back to requesting users, so that transient or sensitive pages are not cached. Internet banking and news sites frequently use this facility.

Data requested with an HTTP 'GET' is likely to be cached if other conditions are met; data obtained in response to a 'POST' is assumed to depend on the data that was POSTed and so is not cached.

Link rot and Web archival
Main article: Link rot

Over time, many Web resources pointed to by hyperlinks disappear, relocate, or are replaced with different content. This phenomenon is referred to in some circles as "link rot" and the hyperlinks affected by it are often called "dead links".

The ephemeral nature of the Web has prompted many efforts to archive Web sites. The Internet Archive is one of the most well-known efforts; it has been active since 1996.

Academic conferences

The major academic event covering the Web is the World Wide Web Conference, promoted by IW3C2.

WWW prefix in Web addresses

The letters "www" are commonly found at the beginning of Web addresses because of the long-standing practice of naming Internet hosts (servers) according to the services they provide. So for example, the host name for a Web server is often "www"; for an FTP server, "ftp"; and for a USENET news server, "news" or "nntp" (after the news protocol NNTP). These host names appear as DNS subdomain names, as in "www.example.com".

This use of such prefixes is not required by any technical standard; indeed, the first Web server was at "nxoc01.cern.ch", and even today many Web sites exist without a "www" prefix. The "www" prefix has no meaning in the way the main Web site is shown. The "www" prefix is simply one choice for a Web site's host name.

However, some website addresses require the www. prefix, and if typed without one, won't work; there are also some which must be typed without the prefix. Sites that do not have Host Headers properly setup are the cause of this. Some hosting companies do not set up a www or @ A record in the web server configuration and/or at the DNS server level.

Some Web browsers will automatically try adding "www." to the beginning, and possibly ".com" to the end, of typed URLs if no host is found without them. All major web browsers will also prefix "http://www." and append ".com" to the address bar contents if the Control and Enter keys are pressed simultaneously. For example, entering "example" in the address bar and then pressing either Enter or Control+Enter will usually resolve to "http://www.example.com", depending on the exact browser version and its settings.

Pronunciation of "www"
Main article: Pronunciation of "www"

In English, "www" is pronounced "double-u double-u double-u". The English writer Douglas Adams once quipped:

The World Wide Web is the only thing I know of whose shortened form takes three times longer to say than what it's short for.

– Douglas Adams, The Independent on Sunday, 1999

It is also interesting that in Mandarin Chinese, "World Wide Web" is commonly translated via a phono-semantic matching to wàn wéi wǎng (万维网), which satisfies "www" and literally means "myriad dimensional net", a translation that very appropriately reflects the design concept and proliferation of the World Wide Web.

Tim Berners-Lee's web-space states that 'World Wide Web' is officially spelled as three separate words, each capitalized, with no intervening hyphens. Additionally, 'Web' (with a capital 'W') is used to indicate its status as an abbreviation.

~All In One~

11 March, 2009

Tutorial : HTTP

Tutorial : WWW

Link Exchange

My Blog Stats

Labels

AdsenseCamp

Blog Stats

Facebook Badge

Search This Blog

AdsenseCamp

Visitor Stats

Followers

ShareThis

Shout Mix

About Me

Links

Avenged Sevenfold - Bat Country

Blog Archive