Many problems security, to systems that are accessible via Web applications. One example Cross-Site Scripting attacks (XSS), wherein a script is injected into an application and executed by the browser or server, granting a malicious user unauthorized access to system resources or sensitive information. On the Top10 2010 list published by Open Web Application Security Project (OWASP), XSS attack is listed as number two. “Cross-Site Scripting (XSS) attacks occur when: 1. Data enters a Web application through an untrusted source, most frequently a web request. 2. The data is included in dynamic content that is sent to a web user without being validated for malicious code” [1].XSS attacks of type 1 are commonly referred to as First Order, or reflected, XSS attacks. This type of attack reflects information immediately from the server. The user is either the attacker himself or the victim of a social engineering ruse (e.g., an emailed link containing the attack). Second Order, or stored, XSS attacks are first stored permanently on the server, then later launched by an unsuspecting user. In both cases, the XSS attacks are successful only when certain vulnerabilities exist in the application; more precisely, when such vulnerabilities remain unresolved. These vulnerabilities primarily involve allowing executable inputs from users (First Order XSS attack) or other applications or existing database elements (Second Order XSS attack) to be directly or indirectly assigned the outputs of Web application without proper sanitization. As outputs of Web applications are executed by the browser, inputs that influence the output scan inject unwanted/malicious codes that are then executed. This classification of XSS injection attacks aligns with the Code Injection Attack definition provided by Ray and Ligatti [2].Research on XSS aims to find vulnerabilities and/or prevent attacks. One challenge in attack prevention is discerning which inputs will result in attacks. Some basic approaches have been deployed to detect and stop attacks by checking for script tags and same-origin verification via HTML “referer” tags1[3]. However, these techniques may not be effective, as sophisticated attacks can easily circumvent tag-based detection and HTML “referer” tags are optional and possibly unavailable. In the recent past, a number of more complex detection techniques [4]–[11] have been proposed that are either based on static analysis or runtime monitoring. Static analysis uses traditional program analysis techniques to identify vulnerable code segments via taint analysis and instruments these segments to avoid their exploitation at runtime. Typically, they are not suited to find exploitation of vulnerabilities resulting from conditional copy. Conditional copy is a technique used in code-interference based injection attacks, wherein one variable value is transferred into another (e.g., character-by-character in a conditional code segment) with direct dependency of the copies or other data operations. Runtime monitoring, on the other hand, relies on a library. of attack patterns or specification of non-attack (allowable)patterns to detect potential attacks. As a result, runtime monitoring can incur prohibitively large overhead if non-vulnerable variables are unnecessarily monitored against attack patterns. Our two-phase method employs both static analysis and runtime monitoring. The static analysis phase includes application translation and concolic unit testing techniques. The runtime monitoring phase uses the facts learned from static analysis to monitor the relevant variables in the application. Any existing runtime monitoring method can be used in ourruntime monitoring phase as long as it is coupled with the information generated in our static analysis phase. Therefore, in this paper, we emphasize the static analysis phase.
Method
Our method consists of the following steps:
1)Preprocessing: Determine the inputs and outputs ac-cording to the grammar of the Web application language (JSP in our case). We statically process the JSP Web application one file at a time to identify the inputs and outputs of each page generated from the files. We define a grammar based on the application language to capture all variables, request and response parameters, session variables, tags, etc. They are easily enumerated from constructs of JSP (or any Web application language) which includes HTML tags and embedded Java code. Any information that the application expects from other pages within the application (e.g., session variables)is also identified as input, since we treat the JSP file as an atomic unit. This aligns with Wu and Offutt [18]; the authors consider “HTML file or section of a server program that prints HTML” as an atomic section in modeling and testing Web-based applications.
2)Translation: Convert relevant parts of the Web application to a test-program (Java in our case) that can be tested easily using existing concolic testing engines. Once preprocessing is complete, the file translation is performed. Note that the result of translation, the Java program, maintains the following aspects of the original JSP application: the relationships between program variables, inputs and outputs, and control flow of the application. Finite state automaton representing vulnerable output All other elements of the JSP application, that relates to generating the HTML page are discarded (commented or put in print statements in the result of translation).The JSP application of Figure 1(a) is translated to the resulting Java program as shown in Figure 1(b). The in-puts, outputs, and the JSP application variables are also present in the Java program. The string variables are con-verted to character arrays, as concolic testing cannot handle strings; however, this does not result in any over/under-approximation in the context of vulnerability detection. Lines 1–8 in Figure 1(b) encode a constructor which mimics external input (in this case, username parameter in JSP application). Line 13 in Figure 1(b) is the translation of Line 3 of JSP application in Figure 1(a). Lines 15–18 are the translation of conditional copy if the JSP application, and Line 19 corresponds to assignment of unameto session variable uName in the JSP application.
3)Testing: Execute the test-program in a testing engine which automatically generates test cases determining vulnerabilities in original Web applications in term sof input/output dependencies. We use the concolic testing tool jCute [12] to automatically generate input test cases that determine vulnerable outputs: outputs that are directly or indirectly assigned from inputs and can contain scripting tags. Concolic testing utilizes a combination of concrete and symbolic execution to generate test cases and to effectively guide exploration of the new program paths, respectively. The method relies on careful placement of assertion statements and identifies testcases that can violate the assertions. As our focus is injection attacks, we check whether the output of the Java program can evaluate to a sequence of characters representing a script. In other words, we need to check whether the sequence contains “<” proceeded by a“>”. The finite state automaton (FSA) in Figure 2 represents such a sequence; zero or more characters followed by “<”,followed by a finite sequence of characters, followed by “>”,and finally, followed by zero or more characters
4)Instrumentation: Insert calls to a monitor (Java class in our case) in the original Web application at pro-gram points before vulnerable outputs are used. This monitor detects potential attacks. Applying jCute, we automatically obtain test cases for which some output variable can be assigned to a three-character sequence starting with “<” and ending with “>”.These are the outputs that can be exploited by injection attacks, and therefore their valuation must be monitored at runtime. We refer to these outputs as vulnerable outputs. Note that there can be many outputs in the application and a small number of them may be vulnerable. By utilizing jCute, we minimize the number of outputs that need to be monitored.