HTTP-based Systems
- HTTP (Hypertext Transfer Protocol)
The Hypertext Transfer Protocol (HTTP) is an application-level protocol for distributed, collaborative, hypermedia information systems. This is the foundation for data communication for the World Wide Web (i.e. Internet) since 1990. HTTP is a generic and stateless protocol which can be used for other purposes as well using extensions of its request methods, error codes, and headers.
Basically, HTTP is a TCP/IP based communication protocol, that is used to deliver data (HTML files, image files, query results, etc.) on the World Wide Web. The default port is TCP 80, but other ports can be used as well. It provides a standardized way for computers to communicate with each other. HTTP specification specifies how clients' request data will be constructed and sent to the server, and how the servers respond to these requests.
There are three basic features that make HTTP a simple but powerful protocol:
- HTTP is connectionless: The HTTP client, i.e., a browser initiates an HTTP request and after a request is made, the client waits for the response. The server processes the request and sends a response back after which client disconnect the connection. So client and server knows about each other during current request and response only. Further requests are made on new connection like client and server are new to each other.
- HTTP is media independent: It means, any type of data can be sent by HTTP as long as both the client and the server know how to handle the data content. It is required for the client as well as the server to specify the content type using appropriate MIME-type.
- HTTP is stateless: As mentioned above, HTTP is connectionless and it is a direct result of HTTP being a stateless protocol. The server and client are aware of each other only during a current request. Afterwards, both of them forget about each other. Due to this nature of the protocol, neither the client nor the browser can retain information between different requests across the web pages.
HTTP is a protocol which allows the fetching of resources, such as HTML documents. It is the foundation of any data exchange on the Web and it is a client-server protocol, which means requests are initiated by the recipient, usually the Web browser. A complete document is reconstructed from the different sub-documents fetched, for instance text, layout description, images, videos, scripts, and more.
Please refer to Wikipedia: HTTP for more details.
- How HTTP Works
HTTP (Hypertext Transfer Protocol) is a request and response protocol for sending requests to a server and receiving responses back in the form of files. HTTP is the foundation of network data communication.
HTTPS is an evolution of HTTP, where the "S" stands for Secure Sockets Layer, making communication in HTTP more secure. Thanks to its encrypted connection, HTTPS is more secure than HTTP.
As a request-response protocol, HTTP gives users a way to interact with web resources such as HTML files by transmitting hypertext messages between clients and servers. HTTP clients generally use Transmission Control Protocol (TCP) connections to communicate with servers.
HTTP utilizes specific request methods in order to perform various tasks. All HTTP servers use the GET and HEAD methods, but not all support the rest of these request methods:
- GET: it requests a specific resource in its entirety
- HEAD: it requests a specific resource without the body content
- POST: it adds content, messages, or data to a new page under an existing web resource
- PUT: it directly modifies an existing web resource or creates a new URI if need be
- DELETE: it gets rid of a specified resource
- TRACE: it shows users any changes or additions made to a web resource
- OPTIONS: it shows users which HTTP methods are available for a specific URL
- CONNECT: it converts the request connection to a transparent TCP/IP tunnel
- PATCH: it partially modifies a web resource
- HTTPS and Switching from HTTP to HTTPS
HTTPS is HTTP with encryption and verification. The only difference between the two protocols is that HTTPS uses TLS (SSL) to encrypt normal HTTP requests and responses, and to digitally sign those requests and responses. As a result, HTTPS is far more secure than HTTP.
Switching to HTTPS is a relatively simple process:
- Buy an SSL certificate: Many web hosting companies such as GoDaddy or Namecheap sell SSL certificates. However, if you wish to purchase a certificate from a third-party vendor, make sure to do your research thoroughly before purchasing.
- Install your SSL certificate through your web hosting: Each web hosting service will have different instructions for installing your SSL certificate.
- Updated all internal links from HTTP to HTTPS: Not all links worked well. If you find any links that still point to the HTTP version of the page, please update them directly in the code.
- Set up any redirects from HTTP to HTTPS: You can also set up manual redirects to ensure that the browser points to the HTTPS version of the web page.
If you're not willing to make these technical changes, most web hosting companies can handle it for you for a price. You still need to update or redirect any remaining internal links to HTTPS.
- The Beginning of HTTP
The development of HTTP was initiated by Tim Berners-Lee at CERN in 1989 and summarized in a simple document describing the behavior of clients and servers using the first version of the HTTP protocol called 0.9. The first version of the HTTP protocol quickly evolved into a more detailed version, the first draft of a far-future version 1.0.
Development of an early HTTP Request for Comments (RFC) began a few years later as a coordinated effort between the Internet Engineering Task Force (IETF) and the World Wide Web Consortium (W3C), before work was transferred to the IETF.
- HTTP/1
HTTP/1 was finalized and fully documented (as version 1.0) in 1996. It developed in 1997 (as version 1.1), then updated its specification in 1999, 2014, and 2022. More than 80% of all websites use its secure variant called HTTPS.
- HTTP/2
HTTP/2, released in 2015, provides a more efficient expression of "on the wire" HTTP semantics. It is now used by 41% of websites and supported by almost all web browsers (over 97% of users). It is also supported by major web servers through Transport Layer Security (TLS) using the Application Layer Protocol Negotiation (ALPN) extension, which requires TLS 1.2 or newer.
- HTTP/3
The successor of HTTP/2, HTTP/3, will be released in 2022. It is now used by over 25% of all websites and supported by many web browsers (over 75% of users). HTTP/3 uses QUIC instead of TCP as the underlying transport protocol. As with HTTP/2, it does not obsolete previous major versions of the protocol. Cloudflare and Google Chrome were the first to add support for HTTP/3, and Firefox also enabled it. HTTP/3 has lower latency for real-world web pages, and if enabled on the server, loads faster than HTTP/2, even faster than HTTP/1.1, in some cases more than 3 times faster than HTTP/1.1 (still normally only enabled).
- Security Concerns Surrounding HTTP
Unfortunately, attackers can choose from many vectors to attack web servers, web applications, and websites. From cross-site scripting (XSS) to HTTP request smuggling, adversaries often exploit well-known vulnerabilities and misconfigurations—not to mention HTTP security issues with sending cleartext credentials over HTTP. The OWASP Foundation highlights the top 10 security risks to avoid when building web applications to help improve defenses against HTTP-based attacks.
Many attacks occur over HTTPS, an encrypted HTTP connection established using SSL/TLS. Learn how monitoring encrypted traffic is important to detect attacks like Log4j that can hide in encrypted HTTPS traffic.
- Components of HTTP-based Systems
HTTP is a client-server protocol: requests are sent by one entity, the user-agent (or a proxy on behalf of it). Most of the time the user-agent is a Web browser, but it can be anything, for example a robot that crawls the Web to populate and maintain a search engine index.
Each individual request is sent to a server, which handles it and provides an answer, called the response. Between the client and the server there are numerous entities, collectively called proxies, which perform different operations and act as gateways or caches, for example.
In reality, there are more computers between a browser and the server handling the request: there are routers, modems, and more. Thanks to the layered design of the Web, these are hidden in the network and transport layers. HTTP is on top, at the application layer. Although important to diagnose network problems, the underlying layers are mostly irrelevant to the description of HTTP.
- Client: the user-agent
The user-agent is any tool that acts on the behalf of the user. This role is primarily performed by the Web browser; other possibilities are programs used by engineers and Web developers to debug their applications.
- The Web Server
On the opposite side of the communication channel, is the server, which serves the document as requested by the client. A server appears as only a single machine virtually: this is because it may actually be a collection of servers, sharing the load (load balancing) or a complex piece of software interrogating other computers (like cache, a DB server, or e-commerce servers), totally or partially generating the document on demand.
- Proxies
Between the Web browser and the server, numerous computers and machines relay the HTTP messages. Due to the layered structure of the Web stack, most of these operate at the transport, network or physical levels, becoming transparent at the HTTP layer and potentially making a significant impact on performance. Those operating at the application layers are generally called proxies. These can be transparent, forwarding on the requests they receive without altering them in any way, or non-transparent, in which case they will change the request in some way before passing it along to the server.
Proxies may perform numerous functions: caching (the cache can be public or private, like the browser cache); filtering (like an antivirus scan or parental controls); load balancing (to allow multiple servers to serve the different requests); authentication (to control access to different resources); logging (allowing the storage of historical information).
[More to come ...]