String Functions
PowerBASIC has a very wide range of functions that operate on strings or return string content. Here's a categorized list of all the functions.

Unicode vs ANSI Strings
To address the issue that there are far more characters in the world's languages than ASCII supported, a ISO character encoding scheme called Unicode has been developed. It consists of over 100,000 characters and uses more than 1 byte per character.

The most common scheme, UTF-8, incorporates the ASCII codes for the first 128 characters. It uses 1 byte each for the ASCII characters and up to 4 bytes for other characters.

Unicode has been adopted by both Windows and Unix-like operating systems for internal character encoding, in part because of its compatibility with legacy ASCII encoding.

ANSI continues to be PowerBASIC's default encoding scheme and it's string functions (such as UCase$, LCase, MCase$, ...) use ANSI encoding. However, PowerBASIC also supports Unicode encoding in two ways. It provides automatic translation to Unicode format for those API requiring Unicode character encoding (but only when string is contained within a variant data type). It also provides the ACode$ and UCode$ functions for converting between the two formats.

Search and Replace
One of the most common actions with strings is to find a string(s) - if it exists and its position within another string. Once found, it is also common to want to replace it, remove it, or insert something next to it. PowerBASIC has several functions that satisfy these needs. The next list groups the string functions into categories, with one-line descriptions of the key features of the function.

Examples of each of these are found below in the reference section.

Str$, Format$, Using$
PowerBASIC provides three ways to return a string expression representing a number - Str$ (least powerful), Format$(powerful) and Using$ (most powerful).

Here's the basic syntax of all three.

    result$ = Str$(number, digits%)                    'numbers
    result$ = Format$(number, digits&, formatstring$)  'numbers
    result$ = Using$(mask$,expression)                 'strings/numbers

And here some examples and comparisons of the three options. Str$ and Format$ are somewhat similar. The mask string formats are very different between Format$ and Using$.

LEFT$, RIGHT$, REMAIN$, EXTRACT$
These function will truncate a string for you. With left$/right$, truncation is from a specified position. With remain$/extract$, truncation is from the location of a specified character. Extract$ returns left side characters and remain$ returns right side characters. Both extract$ and remain$ start searching from the left side of the string. Both also allow specification of the starting position in the string.

    a$ = "12345678"
    result$ = left$(a$,5)               'result$ = "12345"
    result$ = right$(a$,5)              'result$ = "45678"
    result$ = remain$(a$,"6")           'result$ = "78"
    result$ = remain$(a$, ANY "36")     'result$ = "78"
    result$ = extract$(a$,"6")          'result$ = "12345"
    result$ = extract$(a$, ANY "36")    'result$ = "12"

    result$ = remain$(7, a$,"6")        'result$ = ""  not found, returns ""
    result$ = remain$(7, a$, ANY "36")  'result$ = ""  not found, returns ""
    result$ = extract$(7, a$,"6")       'result$ = "78"  
    result$ = extract$(7, a$, ANY "36") 'result$ = "78"

With extract$, if no match is found, all of the string is returned beginning with the starting position. With remain$, if no match is found, none of the string is returned.

CSET, CSET$, RSET, RSET$, LSET, LSET$
In general, these functions insert and justify (left/right/center) a shorter string within a larger string.

With the Abs argument, the larger string content is unchanged except for the positions where the shorter string is inserted.

Without the Abs argument, the larger string content is replace with spaces, except for the positions where the shorter string is inserted. The Using argument allows specification of a replacement character other than a space.

The CSET/RSET/LSET functions put one string within another string, replacing the excess string with the selected padding character. The CSET$/RSET$/LSET$ pad one string with the selected padding character to reach a specific total length.

These actions are shown in the following examples. Note that the starting value of a$ is assumed in each example, rather than the result of a preceding example.

    a$ = "222333444"     ' 9-character string
    b$ = "---"
    CSet Abs a$ = b$     ' a$ = "222---444"  'original string + "---"
    RSet Abs a$ = b$     ' a$ = "222333---"  'original string + "---"
    LSet Abs a$ = b$     ' a$ = "---333444"  'original string + "---"
    CSet a$ = b$         ' a$ = "   ---   "  'insert string + spaces
    RSet a$ = b$         ' a$ = "      ---"  'insert string + spaces
    LSet a$ = b$         ' a$ = "---      "  'insert string + spaces

    a$ = CSET$("xxx", 7)            ' a$ = "  xxx  "   'pad char is space
    a$ = RSET$("xxx", 7)            ' a$ = "    xxx"   'pad char is space
    a$ = LSET$("xxx", 7)            ' a$ = "xxx    "   'pad char is space
    a$ = CSET$("xxx", 7 Using "*")  ' a$ = "**xxx**"   'pad char is *
    a$ = RSET$("xxx", 7 Using "*")  ' a$ = "****xxx"   'pad char is *
    a$ = LSET$("xxx", 7 Using "*")  ' a$ = "xxx****"   'pad char is *

Note that the starting value of a$ is assumed in each example, rather than the result of a preceding example.

BIN$ / HEX$ / OCT$ Functions
PowerBASIC can work with numbers in other than base 10. In particular, it supports base 2, 8, and 16. Numbers in bases other than 10 are written with one of the following prefixes. Numbers in base 10 are referred to a decimal values. Here's the notation to use for describing numbers of other bases. Note that PowerBASIC supports 3 ways to represent octal notation.

    &B  - binary
    &O  - octal (that's letter O, not number zero 0)
    &Q  - octal  &Q7
    &   - octal  &7
    &H  - hexadecimal

    A = &H0F    'hex value of "0F" (decimal 15)
    B = &Q7     'octal value of "7" (decimal 7)
    C = &B11    'binary value of "11" (decimal 3)

To convert numeric values to strings, PowerBASIC supplies the BIN$, OCT$, and HEX$ functions. These three functions take a numeric value and return the binary (base 2), octagonal (base 8), or hexadecimal (base 16) value as a string.

To convert base 2/8/16 strings to numeric (base 10) numbers, use the VAL function. It recognizes the following base number string prefixes.

Regular Expressions
All of the PowerBASIC functions which let you find one string within another string require that you explicitly specify the string being searched, or in some cases specify a range of characters. All, that is, except for the regular expression functions.

In addition to searching for specific strings, Regular Expression functions can search for strings which match patterns. The familar "*.txt" notation for listing files in dialogs is an example of a string pattern. The * is a pattern which stands for any combination of letters.

Regular expressions work in the same way, but with far more pattern options. Patterns can be made up of regular string characters but derive their power from the use of metacharacters. Metacharacters are characters which refer to a pattern, much like the * in the previous example or the characters in format$ or using$ formatting strings. The term "Regular Expressions" refers to the patterns that are defined for the search. Here are the metacharacters supported by PowerBASIC:

Regular Expressions are an extremely powerful tool for search/replace operations. Many useful patterns are very simple but for more complicated search and replace demands, a regular expression can be fairly complicated and can be difficult to debug.

See the PowerBASIC Help file for more information on searching and replacing capabilities of regular expressions. Also, regular expressions are widely used, so you'll be able to find a ton of information on the Internet.

ASCII Character Codes
PowerBASIC supports the ASCII character encoding scheme, covering 128 characters stored as 8 bits (1 byte) per character. The ASCII standard was released in 1963 and is now under control of the American National Standards Institute(ANSI).

The PowerBASIC ASC function returns the ASCII decimal code for characters, whereas the PowerBASIC CHR$ function returns the character corresponding to a specified ASCII code.

ASCII characters and encoding values are shown in both hexadecimal and decimal format in the following table.

    Char  Dec Hex     Chr  Dec Hex     Chr Dec Hex      Chr Dec Hex
    -------------     ------------     -----------      -----------
    (nul)   0  00     (sp) 32  20      @    64  40     `    96  60
    (soh)   1  01     !    33  21      A    65  41     a    97  61
    (stx)   2  02     "    34  22      B    66  42     b    98  62
    (etx)   3  03     #    35  23      C    67  43     c    99  63
    (eot)   4  04     $    36  24      D    68  44     d   100  64
    (enq)   5  05     %    37  25      E    69  45     e   101  65
    (ack)   6  06     &    38  26      F    70  46     f   102  66
    (bel)   7  07     '    39  27      G    71  47     g   103  67
    (bs)    8  08     (    40  28      H    72  48     h   104  68
    (ht)    9  09     )    41  29      I    73  49     i   105  69
    (nl)   10  0a     *    42  2a      J    74  4a     j   106  6a
    (vt)   11  0b     +    43  2b      K    75  4b     k   107  6b
    (np)   12  0c     ,    44  2c      L    76  4c     l   108  6c
    (cr)   13  0d     -    45  2d      M    77  4d     m   109  6d
    (so)   14  0e     .    46  2e      N    78  4e     n   110  6e
    (si)   15  0f     /    47  2f      O    79  4f     o   111  6f
    (dle)  16  10     0    48  30      P    80  50     p   112  70
    (dc1)  17  11     1    49  31      Q    81  51     q   113  71
    (dc2)  18  12     2    50  32      R    82  52     r   114  72
    (dc3)  19  13     3    51  33      S    83  53     s   115  73
    (dc4)  20  14     4    52  34      T    84  54     t   116  74
    (nak)  21  15     5    53  35      U    85  55     u   117  75
    (syn)  22  16     6    54  36      V    86  56     v   118  76
    (etb)  23  17     7    55  37      W    87  57     w   119  77
    (can)  24  18     8    56  38      X    88  58     x   120  78
    (em)   25  19     9    57  39      Y    89  59     y   121  79
    (sub)  26  1a     :    58  3a      Z    90  5a     z   122  7a
    (esc)  27  1b     ;    59  3b      [    91  5b     {   123  7b
    (fs)   28  1c     <    60  3c      \    92  5c     |   124  7c
    (gs)   29  1d     =    61  3d      ]    93  5d     }   125  7d
    (rs)   30  1e     >    62  3e      ^    94  5e     ~   126  7e
    (us)   31  1f     ?    63  3f      _    95  5f     del 127  7f

String Function Listing
Here's a simple listing of the string functions above, with a one-line description of what the function does. Syntax and examples are given in the next section.

String Functions Reference
Here are examples for each of the string functions. The functions are listed in alphabetical order.

If you have any suggestions or corrections, please let me know.