Home
gbXML Help
Forums
Personal
<language> <tokenset> <validscope ... /> <tokens> ... </tokens> <tokens2> ... </tokens2> </tokenset> </language> A typical language definition file might consist of 5-10 tokenset elements, each with 2-4 validscope elements, 1 tokens element, and 1 tokens2 element. gbXML allows languages to have up to 50 tokensetes and up to 10 validscopes. Additional details on each element type are provided below.
Tokens and Tokenset Sometimes a token pair needs to be identified, where all text found within the token pair can be given specific formatting instructions. In this case, a second list of tokens may be required in the language definition - one for the starting token of the pair and a second for the ending token of the pair. The token list (or lists, if token pairs are involved) are placed within a tokenset element. Tokenset attributes may be defined which describe the formatting to be applied to its tokens or to the content between token pairs. Formatting information can also be entered on a token-by-token basis, overriding the formatting instructions entered at the tokenset level. The tokens lists are typically simple multi-line listings of every token to which the formatting will be applied. However, a list of tokens may also be defined through the use of regular expressions. Using a single regular expression to define an entire list of tokens is a powerful simplifying tool for creating language definition files.
Tokensets Types Here's a simple example of a list tokenset with only a few tokens. Assigning properties (attributes) of XML elements with be discussed later.
<language name="java"> <tokenset name="Common Words" id="keywords" type="list" forecolor="red"> <tokens> <token>if</token> <token>while</token> <token>end</token> </tokens> </tokenset> In this example, three tokens (if, while, end) are defined and will be displayed in the color red. The second kind of tokenset, a scope tokenset, is a list of one or more token pairs. Each pair consists of a starting and ending token, where specific display characteristics are applied to the source code between the two tokens (as well as to the tokens themselves). For example, in most languages double-quotes are used to enclose strings. A scope tokenset which defines a pair of double-quote tokens would be used to apply formatting to all source code between the two double-quote characters. A scope tokenset can include multiple token pairs. The first token of a pair is placed in a tokens element. The second, corresponding token of a pair is placed in a tokens2 element. Both tokens and tokens2 elements can contain any number of tokens but must contain the same number of tokens, corresponding to pairs of tokens. Here's an example of a scope tokenset. <language name="java"> <tokenset name="String Tokens" id="strings" type="scope" forecolor="blue"> <tokens> <token>"</token> <token>'</token> </tokens> <tokens2> <token>"</token> <token>'</token> </tokens2> </tokenset> In this example, two pairs of tokens are defined - a pair of double-quotes and a pair of single quotes - both of which are used to enclose strings in many languages. Source code between either token pair would be colored blue in this example. gbXML language definition files also support special, single-token scope definitions - where a single token is used to define the start of a scope and the end of the line of text defines the end of the scope. In such cases, only the tokens element is needed - no tokens2 element is required. For example, a single quote is used in Visual Basic to represent comments. The end of the line defines the end of the comment scope.
Validscopes The validscope element is used to specify that a tokenset is to be valid (recognized) within an other element. A tokenset may contain any number of validscope elements. Typically, only 2-4 validscopes are required to describe most languages. If no validscopes are enclosed in a tokenset element, the tokenset is treated as valid everywhere. Here's an XML example showing how to indicate that a hyperlink should be valid within a string scope tokenset. In this case, note that the hyperlink token is defined as a regular expression.
<language name="java"> <tokenset name="String Tokens" id="strings" type="scope" forecolor="blue"> <tokens> <token>"</token> </tokens> <tokens2> <token>"</token> </tokens2> </tokenset> <tokenset name="Active Links" id="hyperlinks" type="scope" forecolor="red"> <validscope name="String Tokens" /> <tokens regexp="yes" > <token> https?://([\.~:?#=\w]+\w+)(/[\.~:?#=\w]+\w)* </token> </tokens> </tokenset> In this example, the hyperlink would be displayed as red text within a blue text string.
|