OWASP Java HTML Sanitizer

General

Category
Free
Tag
HTML
License
BSD License
Registered
Jul 5, 2014
Favorites
0
Link
https://github.com/owasp/java-html-sanitizer
See also
jsoup-annotations
Jericho
XTML
ksoup
RxRetroJsoup

Additional

Language
Java
Version
release-20180219.1 (Feb 19, 2018)
Created
Apr 27, 2015
Updated
Feb 19, 2018
Owner
OWASP
Contributors
mikesamuel
jmanico
cure53
jamesdaily
ronabop
rnnds
sbearcsiro
edbaker83
lillesand
pukomuko
10
Activity
Badge
Generate
Download
Source code

Commercial

OWASP Java HTML Sanitizer

A fast and easy to configure HTML Sanitizer written in Java which lets you include HTML authored by third-parties in your web application while protecting against XSS.

The existing dependencies are on guava and JSR 305. The other jars are only needed by the test suite. The JSR 305 dependency is a compile-only dependency, only needed for annotations.

This code was written with security best practices in mind, has an extensive test suite, and has undergone adversarial security review.


Getting Started includes instructions on how to get started with or without Maven.

You can use prepackaged policies:

PolicyFactory policy = Sanitizers.FORMATTING.and(Sanitizers.LINKS);
String safeHTML = policy.sanitize(untrustedHTML);

or the tests show how to configure your own policy:

PolicyFactory policy = new HtmlPolicyBuilder()
    .allowElements("a")
    .allowUrlProtocols("https")
    .allowAttributes("href").onElements("a")
    .requireRelNofollowOnLinks()
    .toFactory();
String safeHTML = policy.sanitize(untrustedHTML);

or you can write custom policies to do things like changing h1s to divs with a certain class:

PolicyFactory policy = new HtmlPolicyBuilder()
    .allowElements("p")
    .allowElements(
        new ElementPolicy() {
          public String apply(String elementName, List<String> attrs) {
            attrs.add("class");
            attrs.add("header-" + elementName);
            return "div";
          }
        }, "h1", "h2", "h3", "h4", "h5", "h6")
    .toFactory();
String safeHTML = policy.sanitize(untrustedHTML);

Please note that the elements "a", "font", "img", "input" and "span" need to be explicitly whitelisted using the allowWithoutAttributes() method if you want them to be allowed through the filter when these elements do not include any attributes.


Subscribe to the mailing list to be notified of known Vulnerabilities. If you wish to report a vulnerability, please see AttackReviewGroundRules.


Thanks to everyone who has helped with criticism and code