Parametric Search Appliance
The XML schema for entities is below. Optional elements are shown in square brackets; elements that may repeat are followed by ellipses. The syntax is largely compatible with the Google Search Appliance's Entity Recognition XML format, but see here for differences.
<?xml version="1.0"?>
<instances>
  <instance>
    <name>Counties</name>
    [<case_sensitive>N</case_sensitive>]
    [<apply_case>as_is</apply_case>]
    [<store_term_or_name>term</store_term_or_name>]
    [<store_regex_or_name>regex</store_regex_or_name>]
    [<pattern>(?:[[:upper:]]\w+\s+)+County</pattern>
     ...]
    [<term>Adams County</term>
     ...]
  </instance>
  ...
</instances>
The root element is <instances>, which contains one or more
entities, each defined in an <instance> element.  Each
<instance> has the following children:
<name> (required) - The name of the entity.  Like
    Parametric Fields, an entity name must be composed solely of 1 to
    29 alphanumerics or underscores (with the first character
    alphabetic), and the name must not be a SQL keyword.<case_sensitive> (optional) - Whether <term>s
    match case-sensitively or not; a Y or N value.
    The default if unspecified is N.<apply_case> (optional) - How to transform the case of
    text matches, before storing the entity.  One of the following
    values:
    as_is - Leave text as-is; no transformationlowercase - Lower-case the matchuppercase - Upper-case the matchtitlecase - Title-case the match: capitalize the
        first letter of each wordtitlecase_first_word - Title-case just the first word
    as_is.  Note that only
    matches stored from document text are affected:
    <term> matches when <store_term_or_name> is
    name or term_tag, and <pattern> matches when
    <store_regex_or_name> is name, are not modified.
    This allows mixed-case <term> values - e.g. McDuff
    - to retain their custom-specified case when stored, while still
    canonicalizing the possibly-variant cases of <pattern>
    matches in text, when both are specified for the same entity.<store_term_or_name> (optional) - What to store as the
    entity for <term> matches.  One of the following values:
    term - Store the text matched; this is the default
        if unspecified.  Useful if knowing which <term> matched
        is significant; e.g. when looking for a list of cities, and
        search results will be Grouped By city.name - Store the entity <name> value.
        Useful when just the existence of the entity matters, i.e.
        all the terms are synonymous.  (E.g. an entity named
        Water with terms water, H2O and
        dihydrogen monoxide, and any occurrence should be
        stored as Water.)term_tag - Store the <term> value.  Useful
        if the specific term matters, and it should be saved with the
        same case as in the <term>, not the text.  E.g. if a
        custom-case <term> like McDuff is set, it may
        match Mcduff, MCDUFF etc. in the text - the
        McDuff case variant is stored.
    <store_regex_or_name> (optional) - What to store as the
    entity for <pattern> matches.  One of the following values:
    regex - Store the text matched; this is the default
        if unspecified.name - Store the entity <name> value.
        Useful if just the existence of the entity matters; e.g. the
        <pattern>s are looking for credit-card or phone
        numbers, and the exact digits do not matter, just the fact
        that the document contains a credit-card or phone number.regex_tagged_as_first_group - Store the text
        matched by the first parenthetical capture group of the
        <pattern>.  For example, the pattern "Mr\. (\w+)"
        could be used with regex_tagged_as_first_group to store
        just the last name found, without the "Mr." title.
        Note that REX syntax uses the \P and \F
        operators to indicate what part of the expression to store,
        and does not support capture groups; thus
        regex_tagged_as_first_group is not valid for REX
        <pattern>s.
    <pattern> (optional; zero or more occurrences) - A
    regular expression (regex) to match entities in document text.
    The default syntax is that of Google's RE2 library.  REX syntax
    may also be used, by preceding the expression with \<rex\>.
    To store just part of the text matched, use a parenthetical
    capture group in the expression and set
    <store_regex_or_name> to
    regex_tagged_as_first_group; or use a REX expression with
    the \P and \F operators.
    Note: On some platforms, RE2 syntax is not supported, and
    REX syntax must be used.  These platforms will give the
    error message "REX: RE2 not supported on this platform"
    when uploading an entity file containing RE2 <pattern>s.
    (Windows, Linux 2.6 and later versions except
    i686-unknown-linux2.6.17-64-32 are supported.)
    RE2 syntax is documented at
    https://github.com/google/re2/wiki/Syntax.
<term> (optional; zero or more occurrences) - A term to
    find as an entity in document text.  The term is searched for
    exactly, as a phrase (no quotes needed).  It is matched
    case-insensitively, unless <case_sensitive> is set to
    Y.
Note that more than one entity may be defined in a file, since the
<instance> element defining an entity may occur repeatedly.