Yate supports the re-encoding of text via the Multi Field Editor, the Yate Transformations submenu found on text field context menus and via the Re-Encode action statement.
The following functions are available. Items followed by a * are available everywhere. Items not followed by a * are only available via the Re-Encode action statement.
- Re-encode to Cyrillic *
- Re-encode to Windows code page 1251. See Re-encoding code pages at the end of the document.
- Re-encode to Greek *
- Re-encode to Windows code page 1253. See Re-encoding code pages at the end of the document.
- Re-encode to Turkish *
- Re-encode to Windows code page 1254. See Re-encoding code pages at the end of the document.
- Re-encode to ASCII *
This function attempts to ensure that every character in the result can be represented as an ASCII character. It does so by changing various characters to their similar ASCII equivalents, removing accents if necessary and as a last resort by changing characters which cannot be represented as ASCII to underscore characters. Unicode UNFC, and Fold Characters are applied.
- Re-encode to ASCII (Lossy)
This function re-encodes as ASCII discarding characters which cannot be represented. This is the system implementation which is included for historic reasons. Re-encode to ASCII is a better choice.
- Re-encode to ISO Latin-1 *
This function attempts to ensure that every character in the result can be represented as an ISO Latin-1 character. It does so by changing various characters to their similar ISO Latin-1 equivalents, removing accents if necessary and as a last resort by changing characters which cannot be represented as ISO Latin-1 to underscore characters. Unicode UNFC, and Fold Characters are applied.
- Re-encode to ISO Latin-1 (Lossy)
This function re-encodes as ISO Latin-1 discarding characters which cannot be represented. This is the system implementation which is included for historic reasons. Re-encode to ISO Latin-1 is a better choice.
- Re-encode to ISO Latin-2 *
- Reencode to the ISO Latin-2 character encoding.
- Re-encode to WinLatin-1 *
- Re-encode to Windows code page 1252. See Re-encoding code pages at the end of the document.
- Re-encode to WinLatin-2 *
- Re-encode to Windows code page 1250. See Re-encoding code pages at the end of the document.
- Fold Characters *
This function changes various characters to their similar Latin-1 equivalents. Currently this includes single and double quote equivalents as well as dash/hyphen equivalents. Unicode UNFC is applied. A complete list of the current substitutions can be found here.
- Remove Accents *
This function re-encodes all accented characters to their baseline unaccented characters, wherever possible. Fold Characters is applied.
- Re-encode to Unicode UNFC *
Unicode supports the encoding of most accented characters as precomposed single characters or decomposed sequences. É, precomposed has a string length of 1. When decomposed it has a string length of 2. The string displays correctly regardless of the encoding. When Unicode UNFC is selected, the associated fields are converted to their precomposed encoding. UNFC stands for Unicode Normalization Form C. Note that this transformation should rarely be required.
- Re-encode to Unicode UNFD *
Unicode supports the encoding of most accented characters as precomposed single characters or decomposed sequences. É, precomposed has a string length of 1. When decomposed it has a string length of 2. The string displays correctly regardless of the encoding. When Unicode UNFD is selected, the associated fields are converted to their decomposed encoding. UNFD stands for Unicode Normalization Form D. Note that this transformation should rarely be required.
- Escape for JSON
Re-encodes the text for inclusion in a JSON string. Note the word string! Do not use this function to re-encode entire JSON sequences.
- Remove RTF Formatting
If the data is properly structured RTF, the formatting will be removed leaving only the text.
- Remove Prompt Markup Sequences
Remove prompt markup sequences. If the source text does not start with <m>, it will be returned without modification.
- Remove HTML & Sequences
All HTML & sequences will be replaced with the characters they describe. The full source need not be valid HTML. Invalid & sequences are not modified.
- Add HTML & Sequences
The following transformations will be applied:
| Source Character | Transformed Sequence |
|---|
| & | & |
| < | < |
| > | > |
| " | " |
| ' | ' |
- Re-encode to Base64
The source is encoded as Base64.
- Decode Base64
The source is assumed to be Base64 and is decoded. If any errors occur, the returned value will be empty.
- Escape for Regular Expression Pattern
Backslash characters are added to escape characters used in a regular expression pattern.
- Escape for Regular Expression Template
Backslash characters are added to escape characters used in a regular expression replace template.
- Escape for Container Directed Path
The names of items in containers cannot start or end with spaces and cannot contain the following characters: . " [ newline
If you are constructing a directed path with elements in variables, it may be necessary to have them escaped. For example in container.\<file path> file paths may contain periods and must be escaped.
This function will replace all single double quote characters with two double quote characters and will enclose the entire string in double quotes. For example: space text"text.text will become " text""text.text"
- Remove URL % Encoding
URL percent encoded sequences are converted back to their textual representation.
- Escape for URL (Custom)
The named variable URL Custom Encode Set is read to determine a list of characters to be percent escaped.
Escaping components of a URL
There are seven more functions dedicated to percent encoding content based on the characters which must be escaped for particular components of a URL. The following functions are available and illustrate the text that requires the specific escaping for the following URL:
http://username:password@www.site.com/index.html?name=value#pagelink
- Escape for URL Host Component
www.site.com
- Escape for URL Path Component
/index.html
- Escape for URL Fragment Component
pagelink
- Escape for URL Password Component
password
- Escape for URL Query Component
name=value
- Re-encode as URL Query Value Component
The value portion of name=value
- Escape for URL User Component
username
Re-encoding code pages
The ID3 specification uses ISO-Latin-1 as its 8 bit text encoding. In the past before UTF-8 was supported, many people specified their mp3 fields in a variety of languages which contained characters not supported in ISO Latin-1. When these files are read by Yate, fields which specify an encoding of ISO Latin-1 may not display the correct characters if in fact they were not ISO Latin-1 characters.
This statement allows you to specify the original encoding and attempt to re-encode to the actual encoding. Modifications will be made wherever possible. Note that if a field currently contains characters which cannot be represented in ISO Latin-1, no modifications will occur.
The algorithm essentially re-encodes the Mac's internal representation of a string back to ISO Latin-1 and then encodes the raw data using your specified encoding.