Chapter 5. Validate All Input


Wisdom will save you from the ways of wicked men, from men whose words are perverse...

 Proverbs 2:12 (NIV)
Table of Contents
5.1. Basics of input validation
5.2. Input Validation Tools including Regular Expressions
5.2.1. Introduction to regular expressions
5.2.2. Using regular expressions for input validation
5.2.3. Regular expression denial of service (reDOS) attacks
5.3. Command line
5.4. Environment Variables
5.4.1. Some Environment Variables are Dangerous
5.4.2. Environment Variable Storage Format is Dangerous
5.4.3. The Solution - Extract and Erase
5.4.4. Don’t Let Users Set Their Own Environment Variables
5.5. File Descriptors
5.6. File Names
5.7. File Contents
5.8. Web-Based Application Inputs (Especially CGI Scripts)
5.9. Other Inputs
5.10. Human Language (Locale) Selection
5.10.1. How Locales are Selected
5.10.2. Locale Support Mechanisms
5.10.3. Legal Values
5.10.4. Bottom Line
5.11. Character Encoding
5.11.1. Introduction to Character Encoding
5.11.2. Introduction to UTF-8
5.11.3. UTF-8 Security Issues
5.11.4. UTF-8 Legal Values
5.11.5. UTF-8 Related Issues
5.12. Prevent Cross-site Malicious Content on Input
5.13. Filter HTML/URIs That May Be Re-presented
5.13.1. Remove or Forbid Some HTML Data
5.13.2. Encoding HTML Data
5.13.3. Validating HTML Data
5.13.4. Validating Hypertext Links (URIs/URLs)
5.13.5. Other HTML tags
5.13.6. Related Issues
5.14. Forbid HTTP GET To Perform Non-Queries
5.15. Counter SPAM
5.16. Limit Valid Input Time and Load Level

Some inputs are from untrustable users, so those inputs must be validated (filtered) before being used. We will first discuss the basics of input validation. This is followed by subsections that discuss different kinds of inputs to a program; note that input includes process state such as environment variables, umask values, and so on. Not all inputs are under the control of an untrusted user, so you need only worry about those inputs that are.