413 lines
		
	
	
		
			15 KiB
		
	
	
	
		
			HTML
		
	
	
	
		
			Vendored
		
	
	
	
			
		
		
	
	
			413 lines
		
	
	
		
			15 KiB
		
	
	
	
		
			HTML
		
	
	
	
		
			Vendored
		
	
	
	
| <?xml version="1.0" encoding="UTF-8"?>
 | |
| <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 | |
|   "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
 | |
| <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
 | |
|   <head>
 | |
|     <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
 | |
|     <meta name="description" content="Describes config schema framework in HTML Purifier." />
 | |
|     <link rel="stylesheet" type="text/css" href="./style.css" />
 | |
|     <title>Config Schema - HTML Purifier</title>
 | |
|   </head>
 | |
|   <body>
 | |
| 
 | |
|     <h1>Config Schema</h1>
 | |
| 
 | |
|     <div id="filing">Filed under Development</div>
 | |
|     <div id="index">Return to the <a href="index.html">index</a>.</div>
 | |
|     <div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
 | |
| 
 | |
|     <p>
 | |
|       HTML Purifier has a fairly complex system for configuration. Users
 | |
|       interact with a <code>HTMLPurifier_Config</code> object to
 | |
|       set configuration directives. The values they set are validated according
 | |
|       to a configuration schema, <code>HTMLPurifier_ConfigSchema</code>.
 | |
|     </p>
 | |
| 
 | |
|     <p>
 | |
|       The schema is mostly transparent to end-users, but if you're doing development
 | |
|       work for HTML Purifier and need to define a new configuration directive,
 | |
|       you'll need to interact with it. We'll also talk about how to define
 | |
|       userspace configuration directives at the very end.
 | |
|     </p>
 | |
| 
 | |
|     <h2>Write a directive file</h2>
 | |
| 
 | |
|     <p>
 | |
|       Directive files define configuration directives to be used by
 | |
|       HTML Purifier. They are placed in <code>library/HTMLPurifier/ConfigSchema/schema/</code>
 | |
|       in the form <code><em>Namespace</em>.<em>Directive</em>.txt</code> (I
 | |
|       couldn't think of a more descriptive file extension.)
 | |
|       Directive files are actually what we call <code>StringHash</code>es,
 | |
|       i.e. associative arrays represented in a string form reminiscent of
 | |
|       <a href="http://qa.php.net/write-test.php">PHPT</a> tests. Here's a
 | |
|       sample directive file, <code>Test.Sample.txt</code>:
 | |
|     </p>
 | |
| 
 | |
|     <pre>Test.Sample
 | |
| TYPE: string/null
 | |
| DEFAULT: NULL
 | |
| ALLOWED: 'foo', 'bar'
 | |
| VALUE-ALIASES: 'baz' => 'bar'
 | |
| VERSION: 3.1.0
 | |
| --DESCRIPTION--
 | |
| This is a sample configuration directive for the purposes of the
 | |
| <code>dev-config-schema.html<code> documentation.
 | |
| --ALIASES--
 | |
| Test.Example</pre>
 | |
| 
 | |
|     <p>
 | |
|       Each of these segments has a specific meaning:
 | |
|     </p>
 | |
| 
 | |
|     <table class="table">
 | |
|       <thead>
 | |
|         <tr>
 | |
|           <th>Key</th>
 | |
|           <th>Example</th>
 | |
|           <th>Description</th>
 | |
|         </tr>
 | |
|       </thead>
 | |
|       <tbody>
 | |
|         <tr>
 | |
|           <td>ID</td>
 | |
|           <td>Test.Sample</td>
 | |
|           <td>The name of the directive, in the form Namespace.Directive
 | |
|           (implicitly the first line)</td>
 | |
|         </tr>
 | |
|         <tr>
 | |
|           <td>TYPE</td>
 | |
|           <td>string/null</td>
 | |
|           <td>The type of variable this directive accepts. See below for
 | |
|           details. You can also add <code>/null</code> to the end of
 | |
|           any basic type to allow null values too.</td>
 | |
|         </tr>
 | |
|         <tr>
 | |
|           <td>DEFAULT</td>
 | |
|           <td>NULL</td>
 | |
|           <td>A parseable PHP expression of the default value.</td>
 | |
|         </tr>
 | |
|         <tr>
 | |
|           <td>DESCRIPTION</td>
 | |
|           <td>This is a...</td>
 | |
|           <td>An HTML description of what this directive does.</td>
 | |
|         </tr>
 | |
|         <tr>
 | |
|           <td>VERSION</td>
 | |
|           <td>3.1.0</td>
 | |
|           <td><em>Recommended</em>. The version of HTML Purifier this directive was added.
 | |
|           Directives that have been around since 1.0.0 don't have this,
 | |
|           but any new ones should.</td>
 | |
|         </tr>
 | |
|         <tr>
 | |
|           <td>ALIASES</td>
 | |
|           <td>Test.Example</td>
 | |
|           <td><em>Optional</em>. A comma separated list of aliases for this directive.
 | |
|           This is most useful for backwards compatibility and should
 | |
|           not be used otherwise.</td>
 | |
|         </tr>
 | |
|         <tr>
 | |
|           <td>ALLOWED</td>
 | |
|           <td>'foo', 'bar'</td>
 | |
|           <td><em>Optional</em>. Set of allowed value for a directive,
 | |
|           a comma separated list of parseable PHP expressions. This
 | |
|           is only allowed string, istring, text and itext TYPEs.</td>
 | |
|         </tr>
 | |
|         <tr>
 | |
|           <td>VALUE-ALIASES</td>
 | |
|           <td>'baz' => 'bar'</td>
 | |
|           <td><em>Optional</em>. Mapping of one value to another, and
 | |
|           should be a comma separated list of keypair duples. This
 | |
|           is only allowed string, istring, text and itext TYPEs.</td>
 | |
|         </tr>
 | |
|         <tr>
 | |
|           <td>DEPRECATED-VERSION</td>
 | |
|           <td>3.1.0</td>
 | |
|           <td><em>Not shown</em>. Indicates that the directive was
 | |
|           deprecated this version.</td>
 | |
|         </tr>
 | |
|         <tr>
 | |
|           <td>DEPRECATED-USE</td>
 | |
|           <td>Test.NewDirective</td>
 | |
|           <td><em>Not shown</em>. Indicates what new directive should be
 | |
|           used instead. Note that the directives will functionally be
 | |
|           different, although they should offer the same functionality.
 | |
|           If they are identical, use an alias instead.</td>
 | |
|         </tr>
 | |
|         <tr>
 | |
|           <td>EXTERNAL</td>
 | |
|           <td>CSSTidy</td>
 | |
|           <td><em>Not shown</em>. Indicates if there is an external library
 | |
|           the user will need to download and install to use this configuration
 | |
|           directive. As of right now, this is merely a Google-able name; future
 | |
|           versions may also provide links and instructions.</td>
 | |
|         </tr>
 | |
|       </tbody>
 | |
|     </table>
 | |
| 
 | |
|     <p>
 | |
|       Some notes on format and style:
 | |
|     </p>
 | |
| 
 | |
|     <ul>
 | |
|       <li>
 | |
|         Each of these keys can be expressed in the short format
 | |
|         (<code>KEY: Value</code>) or the long format
 | |
|         (<code>--KEY--</code> with value beneath). You must use the
 | |
|         long format if multiple lines are needed, or if a long format
 | |
|         has been used already (that's why <code>ALIASES</code> in our
 | |
|         example is in the long format); otherwise, it's user preference.
 | |
|       </li>
 | |
|       <li>
 | |
|         The HTML descriptions should be wrapped at about 80 columns; do
 | |
|         not rely on editor word-wrapping.
 | |
|       </li>
 | |
|     </ul>
 | |
| 
 | |
|     <p>
 | |
|       Also, as promised, here is the set of possible types:
 | |
|     </p>
 | |
| 
 | |
|     <table class="table">
 | |
|       <thead>
 | |
|         <tr>
 | |
|           <th>Type</th>
 | |
|           <th>Example</th>
 | |
|           <th>Description</th>
 | |
|         </tr>
 | |
|       </thead>
 | |
|       <tbody>
 | |
|         <tr>
 | |
|           <td>string</td>
 | |
|           <td>'Foo'</td>
 | |
|           <td><a href="http://docs.php.net/manual/en/language.types.string.php">String</a> without newlines</td>
 | |
|         </tr>
 | |
|         <tr>
 | |
|           <td>istring</td>
 | |
|           <td>'foo'</td>
 | |
|           <td>Case insensitive ASCII string without newlines</td>
 | |
|         </tr>
 | |
|         <tr>
 | |
|           <td>text</td>
 | |
|           <td>"A<em>\n</em>b"</td>
 | |
|           <td>String with newlines</td>
 | |
|         </tr>
 | |
|         <tr>
 | |
|           <td>itext</td>
 | |
|           <td>"a<em>\n</em>b"</td>
 | |
|           <td>Case insensitive ASCII string without newlines</td>
 | |
|         </tr>
 | |
|         <tr>
 | |
|           <td>int</td>
 | |
|           <td>23</td>
 | |
|           <td>Integer</td>
 | |
|         </tr>
 | |
|         <tr>
 | |
|           <td>float</td>
 | |
|           <td>3.0</td>
 | |
|           <td>Floating point number</td>
 | |
|         </tr>
 | |
|         <tr>
 | |
|           <td>bool</td>
 | |
|           <td>true</td>
 | |
|           <td>Boolean</td>
 | |
|         </tr>
 | |
|         <tr>
 | |
|           <td>lookup</td>
 | |
|           <td>array('key' => true)</td>
 | |
|           <td>Lookup array, used with <code>isset($var[$key])</code></td>
 | |
|         </tr>
 | |
|         <tr>
 | |
|           <td>list</td>
 | |
|           <td>array('f', 'b')</td>
 | |
|           <td>List array, with ordered numerical indexes</td>
 | |
|         </tr>
 | |
|         <tr>
 | |
|           <td>hash</td>
 | |
|           <td>array('key' => 'val')</td>
 | |
|           <td>Associative array of keys to values</td>
 | |
|         </tr>
 | |
|         <tr>
 | |
|           <td>mixed</td>
 | |
|           <td>new stdclass</td>
 | |
|           <td>Any PHP variable is fine</td>
 | |
|         </tr>
 | |
|       </tbody>
 | |
|     </table>
 | |
| 
 | |
|     <p>
 | |
|       The examples represent what will be returned out of the configuration
 | |
|       object; users have a little bit of leeway when setting configuration
 | |
|       values (for example, a lookup value can be specified as a list;
 | |
|       HTML Purifier will flip it as necessary.) These types are defined
 | |
|       in <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/VarParser.php">
 | |
|       library/HTMLPurifier/VarParser.php</a>.
 | |
|     </p>
 | |
| 
 | |
|     <p>
 | |
|       For more information on what values are allowed, and how they are parsed,
 | |
|       consult <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php">
 | |
|       library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php</a>, as well
 | |
|       as <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/Interchange/Directive.php">
 | |
|       library/HTMLPurifier/ConfigSchema/Interchange/Directive.php</a> for
 | |
|       the semantics of the parsed values.
 | |
|     </p>
 | |
| 
 | |
|     <h2>Refreshing the cache</h2>
 | |
| 
 | |
|     <p>
 | |
|       You may have noticed that your directive file isn't doing anything
 | |
|       yet. That's because it hasn't been added to the runtime
 | |
|       <code>HTMLPurifier_ConfigSchema</code> instance. Run
 | |
|       <code>maintenance/generate-schema-cache.php</code> to fix this.
 | |
|       If there were no errors, you're good to go! Don't forget to add
 | |
|       some unit tests for your functionality!
 | |
|     </p>
 | |
| 
 | |
|     <p>
 | |
|       If you ever make changes to your configuration directives, you
 | |
|       will need to run this script again.
 | |
|     </p>
 | |
|     <h2>Adding in-house schema definitions</h2>
 | |
| 
 | |
|     <p>
 | |
|       Placing stuff directly in HTML Purifier's source tree is generally not a
 | |
|       good idea, so HTML Purifier 4.0.0+ has some facilities in place to make your
 | |
|       life easier.
 | |
|     </p>
 | |
| 
 | |
|     <p>
 | |
|       The first is to pass an extra parameter to <code>maintenance/generate-schema-cache.php</code>
 | |
|       with the location of your directory (relative or absolute path will do). For example,
 | |
|       if I'm storing my custom definitions in <em>/var/htmlpurifier/myschema</em>, run:
 | |
|       <code>php maintenance/generate-schema-cache.php /var/htmlpurifier/myschema</code>.
 | |
|     </p>
 | |
| 
 | |
|     <p>
 | |
|       Alternatively, you can create a small loader PHP file in the HTML Purifier base
 | |
|       directory named <code>config-schema.php</code> (this is the same directory
 | |
|       you would place a <code>test-settings.php</code> file).  In this file, add
 | |
|       the following line for each directory you want to load:
 | |
|     </p>
 | |
| 
 | |
| <pre>$builder->buildDir($interchange, '/var/htmlpurifier/myschema');</pre>
 | |
| 
 | |
|     <p>You can even load a single file using:</p>
 | |
| 
 | |
| <pre>$builder->buildFile($interchange, '/var/htmlpurifier/myschema/MyApp.Directive.txt');</pre>
 | |
| 
 | |
|     <p>Storing custom definitions that you don't plan on sending back upstream in
 | |
|     a separate directory is <em>definitely</em> a good idea! Additionally, picking
 | |
|     a good namespace can go a long way to saving you grief if you want to use
 | |
|     someone else's change, but they picked the same name, or if HTML Purifier
 | |
|     decides to add support for a configuration directive that has the same name.</p>
 | |
| 
 | |
|     <!-- TODO: how to name directives that rely on naming conventions -->
 | |
| 
 | |
|     <h2>Errors</h2>
 | |
| 
 | |
|     <p>
 | |
|       All directive files go through a rigorous validation process
 | |
|       through <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/Validator.php">
 | |
|       library/HTMLPurifier/ConfigSchema/Validator.php</a>, as well
 | |
|       as some basic checks during building. While
 | |
|       listing every error out here is out-of-scope for this document, we
 | |
|       can give some general tips for interpreting error messages.
 | |
|       There are two types of errors: builder errors and validation errors.
 | |
|     </p>
 | |
| 
 | |
|     <h3>Builder errors</h3>
 | |
| 
 | |
|     <blockquote>
 | |
|       <p>
 | |
|         <strong>Exception:</strong> Expected type string, got
 | |
|         integer in DEFAULT in directive hash 'Ns.Dir'
 | |
|       </p>
 | |
|     </blockquote>
 | |
| 
 | |
|     <p>
 | |
|       You can identify a builder error by the keyword "directive hash."
 | |
|       These are the easiest to deal with, because they directly correspond
 | |
|       with your directive file. Find the offending directive file (which
 | |
|       is the directive hash plus the .txt extension), find the
 | |
|       offending index ("in DEFAULT" means the DEFAULT key) and fix the error.
 | |
|       This particular error would occur if your default value is not the same
 | |
|       type as TYPE.
 | |
|     </p>
 | |
| 
 | |
|     <h3>Validation errors</h3>
 | |
| 
 | |
|     <blockquote>
 | |
|       <p>
 | |
|         <strong>Exception:</strong> Alias 3 in valueAliases in directive
 | |
|         'Ns.Dir' must be a string
 | |
|       </p>
 | |
|     </blockquote>
 | |
| 
 | |
|     <p>
 | |
|       These are a little trickier, because we're not actually validating
 | |
|       your directive file, or even the direct string hash representation.
 | |
|       We're validating an Interchange object, and the error messages do
 | |
|       not mention any string hash keys.
 | |
|     </p>
 | |
| 
 | |
|     <p>
 | |
|       Nevertheless, it's not difficult to figure out what went wrong.
 | |
|       Read the "context" statements in reverse:
 | |
|     </p>
 | |
| 
 | |
|     <dl>
 | |
|       <dt>in directive 'Ns.Dir'</dt>
 | |
|         <dd>This means we need to look at the directive file <code>Ns.Dir.txt</code></dd>
 | |
|       <dt>in valueAliases</dt>
 | |
|         <dd>There's no key actually called this, but there's one that's close:
 | |
|           VALUE-ALIASES. Indeed, that's where to look.</dd>
 | |
|       <dt>Alias 3</dt>
 | |
|         <dd>The value alias that is equal to 3 is the culprit.</dd>
 | |
|     </dl>
 | |
| 
 | |
|     <p>
 | |
|       In this particular case, you're not allowed to alias integers values to
 | |
|       strings values.
 | |
|     </p>
 | |
| 
 | |
|     <p>
 | |
|       The most difficult part is translating the Interchange member variable (valueAliases)
 | |
|       into a directive file key (VALUE-ALIASES), but there's a one-to-one
 | |
|       correspondence currently. If the two formats diverge, any discrepancies
 | |
|       will be described in <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php">
 | |
|       library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php</a>.
 | |
|     </p>
 | |
| 
 | |
|     <h2>Internals</h2>
 | |
| 
 | |
|     <p>
 | |
|       Much of the configuration schema framework's codebase deals with
 | |
|       shuffling data from one format to another, and doing validation on this
 | |
|       data.
 | |
|       The keystone of all of this is the <code>HTMLPurifier_ConfigSchema_Interchange</code>
 | |
|       class, which represents the purest, parsed representation of the schema.
 | |
|     </p>
 | |
| 
 | |
|     <p>
 | |
|       Hand-writing this data is unwieldy, however, so we write directive files.
 | |
|       These directive files are parsed by <code>HTMLPurifier_StringHashParser</code>
 | |
|       into <code>HTMLPurifier_StringHash</code>es, which then
 | |
|       are run through <code>HTMLPurifier_ConfigSchema_InterchangeBuilder</code>
 | |
|       to construct the interchange object.
 | |
|     </p>
 | |
| 
 | |
|     <p>
 | |
|       From the interchange object, the data can be siphoned into other forms
 | |
|       using <code>HTMLPurifier_ConfigSchema_Builder</code> subclasses.
 | |
|       For example, <code>HTMLPurifier_ConfigSchema_Builder_ConfigSchema</code>
 | |
|       generates a runtime <code>HTMLPurifier_ConfigSchema</code> object,
 | |
|       which <code>HTMLPurifier_Config</code> uses to validate its incoming
 | |
|       data. There is also an XML serializer, which is used to build documentation.
 | |
|     </p>
 | |
| 
 | |
|   </body>
 | |
| </html>
 | |
| 
 | |
| <!-- vim: et sw=4 sts=4
 | |
| -->
 | 
