Introduction
Web-based applications are dangerous in an interconnected world. Attackers are seeking any opportunity that they can
find when staging an attack and web exposure provides an attacker with many choices. Websites that are vulnerable can be
used as mules in an attack against the true target.
Web server attacks are sometimes referred to as "blind" attacks as the attacker will not always see the result of the
attack, only the HTTP response. Most of the time this is an expected error.
Similar to buffer overflows, programmers try to follow a secure coding practice only to discover that the attackers
remain just one step ahead of the game. All it takes is one attacker that discovers a vulnerability and then share it
with others on the underground resulting in a bad day for all administrators who have to deal with the damage.
A priority should be that web applications must be tested as part of an ongoing program of risk management. There are
many tools existing to help the security professional with this task, however, they must be used correctly and their
output must be analyzed correctly as well.
How Web Servers Work
A Web Server's Function
Essentially a file server, a Web server receives requests from a client application for a file and then returns the
results. The "request / response" nature of HTTP must be understood as well the role of both the client and server.
When a Web server is contacted by a user, the user is usually logged on as an anonymous user. In the Windows environment,
this account is IUSR_<computer name> and in Linux it is usually the account "Nobody." The password in both cases is
blank and the user account will have extremely limited access to the system. The request made is in the form of a
Uniform Resource Locator (URL) and a series of header messages.
The HTTP Protocol
The HTTP service between the client and server is provided by the Web server. The HyperText Transfer Protocol is a
"request / response" transaction that allows the exchange of a set of data. What file content that is passed via HTTP
is not important as it is up to the client to decide how to display this data. The specific type of media Multi Part
Internet Media Extension (MIME) is listed in the HTTP header. This allows the client browser to know what Browser
Helper Object (BHO) or "plug-in" is needed to open to receive the file and process it.
The server is required to respond to a GET request. The code in the protocol header indicates the nature of the response.
Here are some examples of these responses.
Series |
Meaning |
100 |
Informational |
200 |
Success |
300 |
Redirection |
400 |
Client Error |
500 |
Server Error |
Protocol header conversations are not visible to the user, normally, as they are processed in the background. With
the use of packet sniffers, proxy applications or plug-ins available for browsers, the headers can be viewed and tracked.
These headers can also be modified during a Man-In-The-Middle (MiTM) attacks.
HTTP defines several methods the client can invoke. Some are safe while others have some "side effects." The safe
methods include:
- HEAD - Requests the headers that are returned if the specified resource would be requested with an HTTP GET
method.
- GET - Requests a representation of the specified resource.
- OPTIONS - Is used to describe the communication options for the target resource.
- TRACE - Echoes back to the client whatever string has been sent to the server, and is used mainly for debugging
purposes.
The ones with the "side effects" include:
- POST - Sends data to the server.
- PUT - Creates a new resource or replaces a representation of the target resource with the request payload.
- DELETE - Deletes the specified resource.
- CONNECT - Starts two-way communications with the requested resource. It can be used to open a tunnel.
Understanding Requests
When the GET method is called from an HTML form, the names of the form elements are paired up with the input data that
is entered by the user and appended to the URL in the form of a query string. The example shows this and it follows
the question mark.
http://company.domain.com/resource/path/page.ext?form_item1=string1&formitem2-string2
The value pairs are passed from the client to the next page using the action attribute on the form and separated by
ampersands. The string to the left of the equal sign is the form element name and the value to the right is what the
user has provided before clicking the submit button.
Form field elements that are hidden in the browser but seen in the source code can carry name value pairs as well. A
form that is tracking the number of login attempts made into a page might look as follows:
http://company.comain.com.com/resource/path/age.ext?attempts=3
The URL or the source code can be modified, reloaded and submitted and possibly overcome the limitation imposed by the
application. This is one of the most important characteristics of the GET method. If credentials are included in the
URL, they will be visible in the browser address bar.
User-names and passwords are not the only credentials used, they can also be token strings most often implemented as
"SessionID=lsdkfwishsleh4hott709" or of similar makeup. The token is generated on the server and passed back and forth
between the client and server in order to establish "state."
HTTP 1.0 did not support "state but was added in HTTP 1.1. It is important for some web applications to understand
unique users and the time they spend on the website. A random string of characters can be used to make this task easy.
If an attacker can sniff or otherwise capture this token, they can assume the identity of the user. They simply
replace the token with their own request. Also, knowing that IP addresses might change during the session and
considering the original intent of the Internet to be stateless, it was a design choice not to consider additional
elements outside of this token to establish a stateful connection. This URL is not protected by Secure Socket Layer (SSL).
Contrary, the POST method places the name value pairs in the HTTP header and can be protected by an SSL connection.
Proxy servers can still be used to perform MiTM attacks if the login occurs before the SSL tunnel is established. When
sensitive data is exchanged in the header that the SSL connection is established first.
URL Encoding Schemes
URL specifications state that certain characters are not allowed from being used or must be encoded because they already
have special meanings. For example, the "space" character cannot be used in a URL and must be substituted by a plus
sign. The ampersand, question mark and equal signs are parts of a query string, sometimes known as a parameter string.
Using the "percent encoding" method is the most common way to encode characters. This is accomplished when a percent
sign is followed y a two-digit hexadecimal representation of the original character. Here are a few examples:
Character |
Code |
Space |
%20 |
Dot |
%2E |
Slash |
%2F |
2 |
%32 |
There are other ways to encode URLs as well by representing the address of the website itself. Some characters can
be double encoded and in some cases even triple encoding might work.
Since the client browser plays an active role in this process, along with the HTTP server, some attacks are better
conducted directly through a Telnet connection to port 80 on the Web server.
Secure Socket Layer (SSL)
Secure Socket Layer was created by Netscape. It was later adopted into an IETF standard as the Transport Layer Security
(TLS) and updated in RFC 5246. This protocol was the start of the small padlock icon that appears on your browser
The client is authenticated by the server in order to exchange a cryptographic secret. This secret is used to create
an encapsulation that encrypts the traffic of the session. Operating at a lower OSI layer than HTTP causes all traffic
between the client and server are protected.
The server sends a digital certificate to the client. The client will then validate this certificate for a level of
trust. The client then uses the public key within the certificate to encrypt a symmetric key that is generated
randomly by the client and then sends it to the server. The server then decrypts the symmetric key with its own
private key. This shared secret is then used to protect the rest of the session data.
Number of protocols are supported by SSL/TLS. There have been several security issues identified with this technique
as well as business related pressures to continue the adoption of legacy implementations.
A 40-bit encryption was an earlier version installation of SSL employed due to cryptographic export standards. Due to
some pressure some standards were relaxed and this remains an issue when certifying an organization to conduct e-commerce.
The Payment Card Industry Data Security Standard (PCI DSS) has strict requirements in terms of protecting consumer
transactions.
As SSL/TLS evolved, many customers with older browsers still required legacy support. Add to this there is a lot of
potential for this technology to be implemented wrongly. Both penetration testers and attackers are always on the
lookout for errors in this area.
Authentication Methods
The three main types of authentication methods supported by Web servers are:
Basic - The user name and password are encoded in Base64 are shown in plain text. This encoding is
need to support an extended set of characters.
Base64 encoding takes the binary representation of a string and repackages it into 6 bits per character. This method is
also a common way to send binary files over systems that are meant for text only like Usenet.
Digest - Incorporates a nonce value, a string that is only generated one time, added to a hash of the
password. The design was created to combat the chosen plain text attacks. This means that the attacker can
take a combination of characters and attempt to reproduce the encrypted stream.
There are additional features added in the attempt to prevent replay attacks and by to ensure the nonce value has a
correlation with time. Replay attacks happen when credentials are sniffed from Internet and presented to the server
at a later time in hopes to be authenticated.
Application - Authentication methods are supported by either the Web server or are custom built depending
on the platform.
The Apache Web server stores credentials in a file known as .htaccess. This file was moved out
of the website's file system in some newer versions, but earlier versions had it placed in a potentially vulnerable place.
The ISS Web Server supports integrated authentication with Windows domain services. This is
when the user account credentials are stored in the same way as they would be for a local version.
Database applications have been custom built for storing credentials. These involve checking that database for information
that is supplied by the user from a login page. It is up to the application developer to assure hashing is properly
implemented and that the SSL connection is established before the login process occurs.
These applications can be subject to Brute Force password attacks using tools such as Brutus or
Hydra unless the developer has implemented a way to limit input tries. It is common for Captchas to be used
in the event that there are between three to five failed login attempts. The Captchas are those distorted images used
in a form to determine if the input onto the form is in fact human or machine generated.
How Web Applications Work
There are three basic layers that Web Applications work on. These are:
- The Presentation Layer
- The Logic Layer
- The Database Layer
The Presentation Layer
The Presentation Layer is where the code gets processed in the visitor's browser. The browser's job
in a Web application is to resolve the DNS addresses, make HTTP requests, receive the page and all resources within the
page and then, through a variety of rendering engines and plug-ins, which are also Browsers Helper Objects (BHOs),
render the page visually.
There are mainly three languages involved, that, when come together is called Dynamic Hypertext Markup Language (DHTML.)
HTML - Hypertext Markup Language provides a way to describe the structure of a page. It marks the
elements such as headers, paragraphs, articles as well as other resources.
CSS - Cascading Style Sheet handles the "look" and feel of a Web page. Browsers have a default way of
displaying a Web page, but by using CSS, the designer can provide a unique "style" to the page. The flexibility of this
design allows the same page content to be usable in a variety of browser clients.
Javascript - This is a scripting language that provides interactive elements to the document and can
access the object modeled by the HTML as well as objects that are built into the browser.
The technology known as Asynchronous Javascript and XML (AJAX) has transformed the client experience. AJAX is a suite
of protocols that uses the XMLHTTPRequest Application Programming Interface (API) to send HTTP requests and pass results
directly to the scripting object in the page on the client side. This enables continuous conversation between the
client and server.
HTML parsing is forgiving of poorly written code, however AJAX requires unambiguous object models created by the markup
of page elements. eXtensible Markeup Language (XML) provides the specification for well formed markup along with the
ability to create an entirely new vocabulary for describing objects. CSS will still be used to define the visual properties
of the page.
AJAX programming is much more involved than standard DHTML but threats still exist if the application is not coded properly.
A big improvement is having the server side validate the form data as it prevents the controls from being altered on the
client side, but XSS and SQL injection attacks are still possible.
The client side can still be manipulated by saving the source code of the page, opening it in a text editor, saving it and
opening the page back into the browser. Also, being meant to validate the form, Javascript can be removed and parameters
written to cookies can be altered as well, even if the cookie is encrypted as it is stored on the hard drive.
The Logic Layer
Through HTTP, the Logic Layer processes active code within the pages of a Web site. When a client requests a page that
is recognized as having code that must be processed, the server runs the code while the expected output then generates
a full text string Web page that is provided to the client in response.
Web pages that need to generate a standard page but with different information provided on that page, such as Google, use
"on-the-fly" logic layer processing. This process initiates a connection to a database, makes a query and then displays
the content needed per the client's request.
For the server side functionality of a Web application, any language can be used. Common Gateway Interface (GCI) is the
specification describing how to create Web applications to meet unique needs of the Internet environment and to cooperate
with HTTP and older protocols. Some examples include:
- PERL (.pl)
- PHP (.php
- Active Server Pages (.asp / .aspx / asp.NET)
- Cold Fusion Markup Language (.cfml)
- Ruby
PERL (Practical Extraction and Reporting Language) was designed originally to replace command line
tools such as SED and SWK as a set of string parsing libraries. PERL is an ideal GCI because of the text-based nature
of HTTP messages, it was originally meant to be used for sorting through large log files.
PHP originally stood for "Personal Home Pages" but later changed to "Hypertext Pre-Processor" but is
know only known as PHP. It was created to be a GCI language and has a very active community of developers as a well as
a share of security issues throughout the years.
Active Server Pages is a GCI language supported by Microsoft and the Internet Information Services (ISS)
browser. The .NET version supports server side form validation and other enhanced features that make up for the deficiencies
of the .asp libraries.
An attacker will look for a file at the root of the Web directory if the site is using .asp. This directory is called
"global.asa." Within this file are held the main configuration of the website and might contain hard-coded database
connection strings and passwords.
Adobe owned, (previously Macromedia) Cold Fusion Markup Language (CFML) was a tag-based syntax that
allowed the developer to easily define reusable code functions that could be called at any time from these "tags."
A relative newcomer to the Web application crowd, Ruby aims to take the idea of Rapid Application
Development (RAD) to a whole new level by providing an Application Programming Interface (API) of many commonly used
functions that can be reused with little to no customization.
Many scripts are available for these languages so the programmer can find the code they need without having to re-write
the code. This is commonly known as "Shrink-Wrapped Code." Any vulnerabilities existing will propagate to the website
where the code is used. This means that the developer must analyze the code to ensure there are no backdoors or other
known issues. What an attacker will look for any obvious signs of code reuse and perform a Google search to locate
additional vulnerable websites.
The Database Layer
It is often necessary to start a session with a database server and pass it an SQL request when a Web page is executed
at the logic layer. User credentials to connect to Binary Large Objects (BLOBs) can be stored in the database and are
populated through other applications like a content management system.
The results that come back, if successful, from the SQL request are processed by the logic layer and formatted into
standard HTML while the document is being prepared to be sent to the requesting party. Each time when a logic layer
script runs a new session is established. It then closes once a transaction is complete. To establish this connection,
a driver is used.
With a Windows system, the use of the Data Connection applet in the Control Panel to setup a Data Source Name (DSN) by
the administrator. Next, it is simple in the .asp code to construct an object based on the DSN and pass it to an SQL
query. The end result is the same regardless of what driver is used; credentials are passed along with a Layer 5
session request then an SQL query is submitted, it is returned and then the connection is closed.
If an attacker can obtain the credentials and once the database technology is determined, the attacker can connect to the
appropriate port or ports with a front-end tool and have access to all data. Another possibility is to manipulate the
SQL query all the way from the presentation layer. This is the basis of an SQL Injection attack.
The Attacks and Risks of Web Applications
Here are 13 known attacks types that focus on Web applications.
Banner Grabbing occurs during the information discovery phase of an attack. It is used to discover the
Web server and operating system versions.
Denial of Service (DoS) attacks are used to either embarrass the owner of the website or as a method of
extortion.
Password Guessing is used to, obviously, guess a user's password. One of the best defenses against this
type of attack is to have "limit" to how many attempts can be made before locking the account down when entering a password.
Robots.txt File Abuse reveals to the attacker what is not meant to be seen online. Robots, also known
as Spiders, are search engine tools that crawl the directories of a website looking for pages to catalog into the search
engine. The robots.text file, placed in the root of the website, tells a robot what it can and cannot index. It also
shows an attacker where to find valuable information.
Offline Browsing at times is more efficient to the attacker because they can download the website first
and then search through it offline for information such as keywords, email addresses, names, etc.
Hidden Form Fields are not intended to display in the browser window and can contain source code
of a recent page. They can contain important data with the form submission that is sent back to the server.
Directory Traversal is also known as "navigating to the parent" directory. The root of the website is
the root of what the user can see on the system. On some systems it is possible to navigate to several parents and then
drill down into the file system itself.
URL Obfuscation is also known as a hyper-link trick and is a type of attack where the real URL that a
user is directed to is obfuscated, or concealed all in the attempt to encourage the user to click-through to the spoof
Web site.
Cookie Stealing is the process of discovering every link the visitor clicks from a database contained
within their system which are stored as small pieces of data on the client side on a Web application. The ways that
cookie stealing can occur are:
- Having physical access to a system allowing the attacker to simply copy cookies sent out to a file
- Using a sniffer or proxy server to eavesdrop on unencrypted cookies and then submitted on the next page request.
- Triangulating cookie tokens on the back end of an application.
Session ID Hijacking is a process of sniffing a session ID string from a legitimate user and then using
that session ID to start a new session with the website using the stolen session. Session IDs are strings of characters
that are associated with a visitor's current visit to a website.
Cross-Site Scripting (XSS) takes advantage of dynamically generated Web pages. In an XSS attack, a Web
application is sent with a script that activates when it is read by a browser or by an application that has not
protected itself against cross-site scripting. This attack was originally known as a CSS attack but the name was
changed to XSS and still sometimes referred to as a JavaScript injection.
SQL Injection, working on similar principals as Cross Site Scripting, is when malicious code is embedded
in a poorly-designed application and then passed to the back-end database. The malicious data then produces database
query results or actions that should never have been executed.