Introduction

Web-based applications are dangerous in an interconnected world. Attackers are seeking any opportunity that they can find when staging an attack and web exposure provides an attacker with many choices. Websites that are vulnerable can be used as mules in an attack against the true target.

Web server attacks are sometimes referred to as "blind" attacks as the attacker will not always see the result of the attack, only the HTTP response. Most of the time this is an expected error.

Similar to buffer overflows, programmers try to follow a secure coding practice only to discover that the attackers remain just one step ahead of the game. All it takes is one attacker that discovers a vulnerability and then share it with others on the underground resulting in a bad day for all administrators who have to deal with the damage.

A priority should be that web applications must be tested as part of an ongoing program of risk management. There are many tools existing to help the security professional with this task, however, they must be used correctly and their output must be analyzed correctly as well.

How Web Servers Work

A Web Server's Function

Essentially a file server, a Web server receives requests from a client application for a file and then returns the results. The "request / response" nature of HTTP must be understood as well the role of both the client and server.

When a Web server is contacted by a user, the user is usually logged on as an anonymous user. In the Windows environment, this account is IUSR_<computer name> and in Linux it is usually the account "Nobody." The password in both cases is blank and the user account will have extremely limited access to the system. The request made is in the form of a Uniform Resource Locator (URL) and a series of header messages.

The HTTP Protocol

The HTTP service between the client and server is provided by the Web server. The HyperText Transfer Protocol is a "request / response" transaction that allows the exchange of a set of data. What file content that is passed via HTTP is not important as it is up to the client to decide how to display this data. The specific type of media Multi Part Internet Media Extension (MIME) is listed in the HTTP header. This allows the client browser to know what Browser Helper Object (BHO) or "plug-in" is needed to open to receive the file and process it.

The server is required to respond to a GET request. The code in the protocol header indicates the nature of the response. Here are some examples of these responses.

Series Meaning
100 Informational
200 Success
300 Redirection
400 Client Error
500 Server Error

Protocol header conversations are not visible to the user, normally, as they are processed in the background. With the use of packet sniffers, proxy applications or plug-ins available for browsers, the headers can be viewed and tracked. These headers can also be modified during a Man-In-The-Middle (MiTM) attacks.

HTTP defines several methods the client can invoke. Some are safe while others have some "side effects." The safe methods include:

  • HEAD - Requests the headers that are returned if the specified resource would be requested with an HTTP GET method.
  • GET - Requests a representation of the specified resource.
  • OPTIONS - Is used to describe the communication options for the target resource.
  • TRACE - Echoes back to the client whatever string has been sent to the server, and is used mainly for debugging purposes.

The ones with the "side effects" include:

  • POST - Sends data to the server.
  • PUT - Creates a new resource or replaces a representation of the target resource with the request payload.
  • DELETE - Deletes the specified resource.
  • CONNECT - Starts two-way communications with the requested resource. It can be used to open a tunnel.

Understanding Requests

When the GET method is called from an HTML form, the names of the form elements are paired up with the input data that is entered by the user and appended to the URL in the form of a query string. The example shows this and it follows the question mark.

http://company.domain.com/resource/path/page.ext?form_item1=string1&formitem2-string2

The value pairs are passed from the client to the next page using the action attribute on the form and separated by ampersands. The string to the left of the equal sign is the form element name and the value to the right is what the user has provided before clicking the submit button.

Form field elements that are hidden in the browser but seen in the source code can carry name value pairs as well. A form that is tracking the number of login attempts made into a page might look as follows:

http://company.comain.com.com/resource/path/age.ext?attempts=3

The URL or the source code can be modified, reloaded and submitted and possibly overcome the limitation imposed by the application. This is one of the most important characteristics of the GET method. If credentials are included in the URL, they will be visible in the browser address bar.

User-names and passwords are not the only credentials used, they can also be token strings most often implemented as "SessionID=lsdkfwishsleh4hott709" or of similar makeup. The token is generated on the server and passed back and forth between the client and server in order to establish "state."

HTTP 1.0 did not support "state but was added in HTTP 1.1. It is important for some web applications to understand unique users and the time they spend on the website. A random string of characters can be used to make this task easy. If an attacker can sniff or otherwise capture this token, they can assume the identity of the user. They simply replace the token with their own request. Also, knowing that IP addresses might change during the session and considering the original intent of the Internet to be stateless, it was a design choice not to consider additional elements outside of this token to establish a stateful connection. This URL is not protected by Secure Socket Layer (SSL).

Contrary, the POST method places the name value pairs in the HTTP header and can be protected by an SSL connection. Proxy servers can still be used to perform MiTM attacks if the login occurs before the SSL tunnel is established. When sensitive data is exchanged in the header that the SSL connection is established first.

URL Encoding Schemes

URL specifications state that certain characters are not allowed from being used or must be encoded because they already have special meanings. For example, the "space" character cannot be used in a URL and must be substituted by a plus sign. The ampersand, question mark and equal signs are parts of a query string, sometimes known as a parameter string.

Using the "percent encoding" method is the most common way to encode characters. This is accomplished when a percent sign is followed y a two-digit hexadecimal representation of the original character. Here are a few examples:

Character Code
Space %20
Dot %2E
Slash %2F
2 %32

There are other ways to encode URLs as well by representing the address of the website itself. Some characters can be double encoded and in some cases even triple encoding might work.

Since the client browser plays an active role in this process, along with the HTTP server, some attacks are better conducted directly through a Telnet connection to port 80 on the Web server.

Secure Socket Layer (SSL)

Secure Socket Layer was created by Netscape. It was later adopted into an IETF standard as the Transport Layer Security (TLS) and updated in RFC 5246. This protocol was the start of the small padlock icon that appears on your browser

The client is authenticated by the server in order to exchange a cryptographic secret. This secret is used to create an encapsulation that encrypts the traffic of the session. Operating at a lower OSI layer than HTTP causes all traffic between the client and server are protected.

The server sends a digital certificate to the client. The client will then validate this certificate for a level of trust. The client then uses the public key within the certificate to encrypt a symmetric key that is generated randomly by the client and then sends it to the server. The server then decrypts the symmetric key with its own private key. This shared secret is then used to protect the rest of the session data.

Number of protocols are supported by SSL/TLS. There have been several security issues identified with this technique as well as business related pressures to continue the adoption of legacy implementations.

A 40-bit encryption was an earlier version installation of SSL employed due to cryptographic export standards. Due to some pressure some standards were relaxed and this remains an issue when certifying an organization to conduct e-commerce. The Payment Card Industry Data Security Standard (PCI DSS) has strict requirements in terms of protecting consumer transactions.

As SSL/TLS evolved, many customers with older browsers still required legacy support. Add to this there is a lot of potential for this technology to be implemented wrongly. Both penetration testers and attackers are always on the lookout for errors in this area.

Authentication Methods

The three main types of authentication methods supported by Web servers are:

  • Basic
  • Digest
  • Application

Basic - The user name and password are encoded in Base64 are shown in plain text. This encoding is need to support an extended set of characters.

Base64 encoding takes the binary representation of a string and repackages it into 6 bits per character. This method is also a common way to send binary files over systems that are meant for text only like Usenet.

Digest - Incorporates a nonce value, a string that is only generated one time, added to a hash of the password. The design was created to combat the chosen plain text attacks. This means that the attacker can take a combination of characters and attempt to reproduce the encrypted stream.

There are additional features added in the attempt to prevent replay attacks and by to ensure the nonce value has a correlation with time. Replay attacks happen when credentials are sniffed from Internet and presented to the server at a later time in hopes to be authenticated.

Application - Authentication methods are supported by either the Web server or are custom built depending on the platform.

The Apache Web server stores credentials in a file known as .htaccess. This file was moved out of the website's file system in some newer versions, but earlier versions had it placed in a potentially vulnerable place.

The ISS Web Server supports integrated authentication with Windows domain services. This is when the user account credentials are stored in the same way as they would be for a local version.

Database applications have been custom built for storing credentials. These involve checking that database for information that is supplied by the user from a login page. It is up to the application developer to assure hashing is properly implemented and that the SSL connection is established before the login process occurs.

These applications can be subject to Brute Force password attacks using tools such as Brutus or Hydra unless the developer has implemented a way to limit input tries. It is common for Captchas to be used in the event that there are between three to five failed login attempts. The Captchas are those distorted images used in a form to determine if the input onto the form is in fact human or machine generated.

How Web Applications Work

There are three basic layers that Web Applications work on. These are:

  • The Presentation Layer
  • The Logic Layer
  • The Database Layer

The Presentation Layer

The Presentation Layer is where the code gets processed in the visitor's browser. The browser's job in a Web application is to resolve the DNS addresses, make HTTP requests, receive the page and all resources within the page and then, through a variety of rendering engines and plug-ins, which are also Browsers Helper Objects (BHOs), render the page visually.

There are mainly three languages involved, that, when come together is called Dynamic Hypertext Markup Language (DHTML.)

  • HTML
  • CSS
  • Javascript

HTML - Hypertext Markup Language provides a way to describe the structure of a page. It marks the elements such as headers, paragraphs, articles as well as other resources.

CSS - Cascading Style Sheet handles the "look" and feel of a Web page. Browsers have a default way of displaying a Web page, but by using CSS, the designer can provide a unique "style" to the page. The flexibility of this design allows the same page content to be usable in a variety of browser clients.

Javascript - This is a scripting language that provides interactive elements to the document and can access the object modeled by the HTML as well as objects that are built into the browser.

The technology known as Asynchronous Javascript and XML (AJAX) has transformed the client experience. AJAX is a suite of protocols that uses the XMLHTTPRequest Application Programming Interface (API) to send HTTP requests and pass results directly to the scripting object in the page on the client side. This enables continuous conversation between the client and server.

HTML parsing is forgiving of poorly written code, however AJAX requires unambiguous object models created by the markup of page elements. eXtensible Markeup Language (XML) provides the specification for well formed markup along with the ability to create an entirely new vocabulary for describing objects. CSS will still be used to define the visual properties of the page.

AJAX programming is much more involved than standard DHTML but threats still exist if the application is not coded properly. A big improvement is having the server side validate the form data as it prevents the controls from being altered on the client side, but XSS and SQL injection attacks are still possible.

The client side can still be manipulated by saving the source code of the page, opening it in a text editor, saving it and opening the page back into the browser. Also, being meant to validate the form, Javascript can be removed and parameters written to cookies can be altered as well, even if the cookie is encrypted as it is stored on the hard drive.

The Logic Layer

Through HTTP, the Logic Layer processes active code within the pages of a Web site. When a client requests a page that is recognized as having code that must be processed, the server runs the code while the expected output then generates a full text string Web page that is provided to the client in response.

Web pages that need to generate a standard page but with different information provided on that page, such as Google, use "on-the-fly" logic layer processing. This process initiates a connection to a database, makes a query and then displays the content needed per the client's request.

For the server side functionality of a Web application, any language can be used. Common Gateway Interface (GCI) is the specification describing how to create Web applications to meet unique needs of the Internet environment and to cooperate with HTTP and older protocols. Some examples include:

  • PERL (.pl)
  • PHP (.php
  • Active Server Pages (.asp / .aspx / asp.NET)
  • Cold Fusion Markup Language (.cfml)
  • Ruby

PERL (Practical Extraction and Reporting Language) was designed originally to replace command line tools such as SED and SWK as a set of string parsing libraries. PERL is an ideal GCI because of the text-based nature of HTTP messages, it was originally meant to be used for sorting through large log files.

PHP originally stood for "Personal Home Pages" but later changed to "Hypertext Pre-Processor" but is know only known as PHP. It was created to be a GCI language and has a very active community of developers as a well as a share of security issues throughout the years.

Active Server Pages is a GCI language supported by Microsoft and the Internet Information Services (ISS) browser. The .NET version supports server side form validation and other enhanced features that make up for the deficiencies of the .asp libraries.

An attacker will look for a file at the root of the Web directory if the site is using .asp. This directory is called "global.asa." Within this file are held the main configuration of the website and might contain hard-coded database connection strings and passwords.

Adobe owned, (previously Macromedia) Cold Fusion Markup Language (CFML) was a tag-based syntax that allowed the developer to easily define reusable code functions that could be called at any time from these "tags."

A relative newcomer to the Web application crowd, Ruby aims to take the idea of Rapid Application Development (RAD) to a whole new level by providing an Application Programming Interface (API) of many commonly used functions that can be reused with little to no customization.

Many scripts are available for these languages so the programmer can find the code they need without having to re-write the code. This is commonly known as "Shrink-Wrapped Code." Any vulnerabilities existing will propagate to the website where the code is used. This means that the developer must analyze the code to ensure there are no backdoors or other known issues. What an attacker will look for any obvious signs of code reuse and perform a Google search to locate additional vulnerable websites.

The Database Layer

It is often necessary to start a session with a database server and pass it an SQL request when a Web page is executed at the logic layer. User credentials to connect to Binary Large Objects (BLOBs) can be stored in the database and are populated through other applications like a content management system.

The results that come back, if successful, from the SQL request are processed by the logic layer and formatted into standard HTML while the document is being prepared to be sent to the requesting party. Each time when a logic layer script runs a new session is established. It then closes once a transaction is complete. To establish this connection, a driver is used.

With a Windows system, the use of the Data Connection applet in the Control Panel to setup a Data Source Name (DSN) by the administrator. Next, it is simple in the .asp code to construct an object based on the DSN and pass it to an SQL query. The end result is the same regardless of what driver is used; credentials are passed along with a Layer 5 session request then an SQL query is submitted, it is returned and then the connection is closed.

If an attacker can obtain the credentials and once the database technology is determined, the attacker can connect to the appropriate port or ports with a front-end tool and have access to all data. Another possibility is to manipulate the SQL query all the way from the presentation layer. This is the basis of an SQL Injection attack.

The Attacks and Risks of Web Applications

Here are 13 known attacks types that focus on Web applications.

Banner Grabbing occurs during the information discovery phase of an attack. It is used to discover the Web server and operating system versions.

Denial of Service (DoS) attacks are used to either embarrass the owner of the website or as a method of extortion.

Password Guessing is used to, obviously, guess a user's password. One of the best defenses against this type of attack is to have "limit" to how many attempts can be made before locking the account down when entering a password.

Robots.txt File Abuse reveals to the attacker what is not meant to be seen online. Robots, also known as Spiders, are search engine tools that crawl the directories of a website looking for pages to catalog into the search engine. The robots.text file, placed in the root of the website, tells a robot what it can and cannot index. It also shows an attacker where to find valuable information.

Offline Browsing at times is more efficient to the attacker because they can download the website first and then search through it offline for information such as keywords, email addresses, names, etc.

Hidden Form Fields are not intended to display in the browser window and can contain source code of a recent page. They can contain important data with the form submission that is sent back to the server.

Directory Traversal is also known as "navigating to the parent" directory. The root of the website is the root of what the user can see on the system. On some systems it is possible to navigate to several parents and then drill down into the file system itself.

URL Obfuscation is also known as a hyper-link trick and is a type of attack where the real URL that a user is directed to is obfuscated, or concealed all in the attempt to encourage the user to click-through to the spoof Web site.

Cookie Stealing is the process of discovering every link the visitor clicks from a database contained within their system which are stored as small pieces of data on the client side on a Web application. The ways that cookie stealing can occur are:

  • Having physical access to a system allowing the attacker to simply copy cookies sent out to a file
  • Using a sniffer or proxy server to eavesdrop on unencrypted cookies and then submitted on the next page request.
  • Triangulating cookie tokens on the back end of an application.

Session ID Hijacking is a process of sniffing a session ID string from a legitimate user and then using that session ID to start a new session with the website using the stolen session. Session IDs are strings of characters that are associated with a visitor's current visit to a website.

Cross-Site Scripting (XSS) takes advantage of dynamically generated Web pages. In an XSS attack, a Web application is sent with a script that activates when it is read by a browser or by an application that has not protected itself against cross-site scripting. This attack was originally known as a CSS attack but the name was changed to XSS and still sometimes referred to as a JavaScript injection.

SQL Injection, working on similar principals as Cross Site Scripting, is when malicious code is embedded in a poorly-designed application and then passed to the back-end database. The malicious data then produces database query results or actions that should never have been executed.