Xfce Foundation Classes
Main Page  | IndexNamespace List  |  Alphabetical List  |  Class List  |  File List


Xfc::String Class Reference

A UTF-8 standard string compatible string class. More...

#include <xfc/utfstring.hh>

List of all members.

Public Types

Public Member Functions

Constructors
Accessors
Forward iterators
Reverse iterators
Append characters to the end of the string
Assign a new value to the string
Compare two strings
The compare methods return 0 it the two strings are the same, a negative number if this string is lexicographically before the comparison string and a positive number otherwise.

Copy the contents of the string into a buffer.
The copy methods return the number of bytes copied into the buffer.

Erase characters from the string
The iterator returned by the erase methods points to the character one past the last character removed.

Insert characters into the string
Replace characters in the string
Resize the string
Search the string for a subsequence
The find methods search the string for a subsequence and if successful return the byte index of the first occurrence found, otherwise npos is returned.

Select a substring
Swap strings
Case conversion
UTF-8 methods
Validate the characters in the string.

Static Public Member Functions

Accessors
Format a new value for the string
UTF-8 filename conversion methods
UTF-8 character set conversion methods

Static Public Attributes


Detailed Description

A UTF-8 standard string compatible string class.

String is a custom UTF-8 string class that provides standard string interoperability. It is implemented using a standard string as an internal byte array but uses its own iterators, Forward_StringIterator and Reverse_StringIterator. To keep the syntax consistent with standard string the String class typedefs these iterators as iterator and revervse_iterator respectively. The const prefix has been omitted since there are no non-const iterators. String presents an interface similar to standard string but with extra method wrappers for GLIB's UTF-8 functions. One important difference are the values returned length() and size(). The length is the number of UTF-8 characters in the string whereas the size is the number of bytes occupied by length characters. Remember, in UTF-8 strings characters can span multiple bytes.


Constructor & Destructor Documentation

Xfc::String::String  ) 
 

Creates an empty string.

Creates an empty String with no characters. This string can be represented as "" and is therefore never null.

Xfc::String::String const String str  ) 
 

Copy constructor.

Parameters:
str A String containing valid UTF-8 characters.
Create a new String from str. This String is never null.

Xfc::String::String const String str,
size_t  char_pos,
size_t  n_chars = npos
 

Creates a string that is a substring of str.

Parameters:
str A String containing valid UTF-8 characters.
char_pos The starting character position.
n_chars The maximum number of UTF-8 characters to read.
Create a new String that is a substring of str. The substring begins at char_pos and contains at most n_chars characters. If n_chars is npos the substring contains all the remaining characters in str. This String is never null.

Xfc::String::String const std::string &  str  ) 
 

Creates a string that is a copy of the standard string str.

Parameters:
str A standard string containing valid UTF-8 characters.
Create a new String from str. This String is never null.

Xfc::String::String const std::string &  str,
size_t  n_chars
 

Creates a string that is a substring of the standard string str.

Parameters:
str A standard string containing valid UTF-8 characters.
n_chars The maximum number of UTF-8 characters to read.
Create a new String that is a substring of str. The substring contains at most n_chars characters. If n_chars is npos the substring contains all the characters in str. This String is never null.

Xfc::String::String const char *  s,
size_t  n_chars
 

Creates a string that is a substring of the characters in array s.

Parameters:
s A UTF-8 character string.
n_chars The maximum number of UTF-8 characters to read.
Create a new String whose contents is n_chars characters pointed to by s, a UTF-8 character string. If n_chars is npos the string contains all the characters pointed to by s. This String is never null.

Xfc::String::String const char *  s  ) 
 

Creates a string that is a copy of the characters in array s.

Parameters:
s A UTF-8 character string.
Create a new String whose contents is equal to the array of UTF-8 characters pointed to by s. If s is a null, then the new String will be null and the null() method will return true. Calling c_str() on a null String will return a null pointer, whereas calling c_str() on an empty String will return a pointer to a character string whose only character is set to null (i.e. ""). This is the only constructor that creates a null String. The concept of a null String exists so that a C string, whose value may or may not be null, can be used to initialize a new String, preserving its null state. Then, when you pass c_str() to a C function, a null pointer is passed if the String is null, not a pointer to an empty string.

Xfc::String::String size_t  n,
char  c
 

Creates a string with n copies of the ascii character c.

Parameters:
n The number of copies of character c.
c An ASCII character.
Create a new String with n copies of the ASCII character c. This String is never null.

Xfc::String::String size_t  n,
gunichar  c
 

Creates a string with n copies of the unicode character c.

Parameters:
n The number of copies of character c.
c A UCS-4 unicode character.
Create a new String with n copies of the unicode character c. This String is never null.

Xfc::String::String const gunichar *  s,
int  n_chars,
G::Error error = 0
 

Creates a string that is a substring of the unicode string s.

Parameters:
s A UCS-4 unicode character string.
n_chars The maximum number of UCS-4 unicode characters to read.
error return location for an allocated G::Error, or null to ignore errors.
Create a new String that is a substring of the UCS-4 unicode character string s. The substring contains at most n_chars unicode characters converted to UTF-8 characters. If n_chars is npos the substring contains all the characters in s. This String is never null.

Xfc::String::String iterator  first,
iterator  last
 

Creates a string with the characters in the range+ first to last.

Parameters:
first An iterator pointing to the first byte of a UTF-8 character.
last An iterator pointing to the first byte of a UTF-8 character.
Create a new String by reading all the characters in the range first to last. This String is never null.


Member Function Documentation

G::Unichar Xfc::String::at size_t  char_pos  )  const
 

Calls get_char_validated() to return the character at char_pos as unicode character.

Parameters:
char_pos The integer character offset.
Returns:
The unicode character at char_pos.

String Xfc::String::casefold size_t  n_bytes = npos  ) 
 

Converts a string into a form that is independent of case.

Parameters:
n_bytes The length in bytes, or npos for the entire string.
Returns:
A new string, that is a case independent form of the string.
The result will not correspond to any particular case, but can be compared for equality or ordered with the results of calling casefold() on other strings. Note that calling casefold() followed by collate() is only an approximation to the correct linguistic case insensitive ordering, though it is a fairly good one. Getting this exactly right would require a more sophisticated collation function that takes case sensitivity into account. Currently, such a function is not provided.

int Xfc::String::collate const String str  ) 
 

Compare the string and str for ordering using the linguistically correct rules for the current locale.

Parameters:
str A UTF-8 encoded string.
Returns:
-1 if the string compares before str, 0 if they compare equal, 1 if string compares after str.
When sorting a large number of strings, it will be significantly faster to comapre them with collate_key() when sorting.

int Xfc::String::collate_key const String str,
size_t  n_bytes = npos
 

Converts the string and str into collation keys and compares them using strcmp().

Parameters:
str A UTF-8 encoded string.
n_bytes The length in bytes, or npos for the entire string.
Returns:
-1 if the string compares before str, 0 if they compare equal, 1 if string compares after str.
The results of comparing the two strings with collate_key will always be the same as comparing the two strings with collate().

String Xfc::String::convert const std::string &  str,
const char *  from_codeset,
G::Error error = 0
[static]
 

Converts a string from one character set to UTF-8.

Parameters:
str The string to convert.
from_codeset The character set of str.
error The location to store any error, or null to ignore errors.
Returns:
The converted String if successful, otherwise a null string and error will be set.
Any of the errors in GConvertError may occur.

std::string Xfc::String::convert const char *  to_codeset,
G::Error error = 0
 

Converts the string from UTF-8 to another character set.

Parameters:
to_codeset The character set to convert the string to.
error The location to store any error, or null to ignore errors.
Returns:
The converted string if successful, otherwise an empty string and error will be set.
Any of the errors in GConvertError may occur.

String Xfc::String::convert_with_fallback const std::string &  str,
const char *  from_codeset,
const char *  fallback,
G::Error error = 0
[static]
 

Converts a string from one character set to UTF-8, possibly including fallback sequences for characters not representable in the output.

Parameters:
str The string to convert.
from_codeset The character set of str.
fallback A UTF-8 string to use in place of a character not present in the target encoding
error The location to store any error, or null to ignore errors.
Returns:
The converted String if successful, otherwise a null string and error will be set.
The fallback string must be in the target encoding. Characters not in the target encoding are represented as Unicode escapes \x{XXXX} or \x{XXXXXX}. Note that it is not guaranteed that the specification for the fallback sequences will be honored. Some systems may do an approximate conversion from from_codeset to UTF-8 in their iconv() functions, in which case GLib will simply return that approximate conversion.

Any of the errors in GConvertError may occur.

String Xfc::String::convert_with_fallback const std::string &  str,
const char *  from_codeset,
G::Error error = 0
[static]
 

Converts a string from one character set to UTF-8, possibly including fallback sequences for characters not representable in the output.

Parameters:
str The string to convert.
from_codeset The character set of str.
error The location to store any error, or null to ignore errors.
Returns:
The converted String if successful, otherwise a null string and error will be set.
This method uses a default fallback string in place of a character not present in the target encoding. Characters not in the target encoding are represented as Unicode escapes \x{XXXX} or \x{XXXXXX}. Note that it is not guaranteed that the specification for the fallback sequences will be honored. Some systems may do an approximate conversion from from_codeset to UTF-8 in their iconv() functions, in which case GLib will simply return that approximate conversion.

Any of the errors in GConvertError may occur.

std::string Xfc::String::convert_with_fallback const char *  to_codeset,
const char *  fallback,
G::Error error = 0
 

Converts the string from UTF-8 to another character set, possibly including fallback sequences for characters not representable in the output.

Parameters:
to_codeset The character set of convert the string to.
fallback A UTF-8 string to use in place of a character not present in the target encoding.
error The location to store any error, or null to ignore errors.
Returns:
The converted string if successful, otherwise an empty string and error will be set.
The fallback string must be in the target encoding. Characters not in the target encoding are represented as Unicode escapes \x{XXXX} or \x{XXXXXX}. Note that it is not guaranteed that the specification for the fallback sequences will be honored. Some systems may do an approximate conversion from from_codeset to UTF-8 in their iconv() functions, in which case GLib will simply return that approximate conversion.

Any of the errors in GConvertError may occur.

std::string Xfc::String::convert_with_fallback const char *  to_codeset,
G::Error error = 0
 

Converts the string from UTF-8 to another character set, possibly including fallback sequences for characters not representable in the output.

Parameters:
to_codeset The character set of convert the string to.
error The location to store any error, or null to ignore errors.
Returns:
The converted string if successful, otherwise an empty string and error will be set.
This method uses a default fallback string in place of a character not present in the target encoding. Characters not in the target encoding are represented as Unicode escapes \x{XXXX} or \x{XXXXXX}. Note that it is not guaranteed that the specification for the fallback sequences will be honored. Some systems may do an approximate conversion from from_codeset to UTF-8 in their iconv() functions, in which case GLib will simply return that approximate conversion.

Any of the errors in GConvertError may occur.

String Xfc::String::format const char *  message_format,
  ...
[static]
 

Convenience method that does sprintf-style string formatting.

Parameters:
message_format The format string (see the printf() documentation).
... A variable list of arguments to insert in the output.
Returns:
A String that holds the formatted output.
Example: Formatting a string.
             String str = String::format("This is a %s string.", "formatted");

String Xfc::String::from_filename const std::string &  opsysstring,
G::Error error = 0
[static]
 

Converts a string from the encoding used for filenames into a UTF-8 string.

Parameters:
opsysstring A string in the encoding for filenames.
error Location to store the error occuring, or null to ignore errors.
Returns:
The converted String.
Any of the errors in GConvertError may occur.

String Xfc::String::from_locale const std::string &  opsysstring,
G::Error error = 0
[static]
 

Convert the opsysstring from the current locale's encoding used by the C runtime into a UTF-8 string.

Parameters:
opsysstring A string in the encoding of the current locale.
error Location to store the error occuring, or null to ignore errors.
Returns:
The converted String.
Any of the errors in GConvertError may occur.

G::Unichar Xfc::String::get_char const_pointer  p  )  [static]
 

Converts the UTF-8 byte sequence at p to a unicode character.

Parameters:
p A constant pointer to a UTF-8 byte sequence.
Returns:
A unicode character or (gunichar)-1 if the unicode character is invalid.

G::Unichar Xfc::String::get_char size_t  char_pos  )  const
 

Converts the UTF-8 byte sequence at char_pos to a unicode character.

Parameters:
char_pos The character position.
Returns:
A unicode character or (gunichar)-1 if the unicode character is invalid.

G::Unichar Xfc::String::get_char_validated const_pointer  p,
size_t  n_bytes
[static]
 

Converts the UTF-8 byte sequence at p to a unicode character.

Parameters:
p A constant pointer to a UTF-8 byte sequence.
n_bytes The maximum number of bytes to read, or npos, for no maximum.
Returns:
A unicode character.
This method checks for incomplete and invalid characters. It returns (gunichar)-2 if the character at char_pos contains a partial byte sequence that could begin a valid character. It returns gunichar(-1) if the character at char_pos does not contain a valid UTF-8 encoded unicode character.

G::Unichar Xfc::String::get_char_validated size_t  char_pos,
size_t  n_bytes = npos
const
 

Converts the UTF-8 byte sequence at p to a unicode character.

Parameters:
char_pos The character position.
n_bytes The maximum number of bytes to read, or npos, for no maximum.
Returns:
A unicode character.
This method checks for incomplete and invalid characters. It returns (gunichar)-2 if the character at char_pos contains a partial byte sequence that could begin a valid character. It returns gunichar(-1) if the character at char_pos does not contain a valid UTF-8 encoded unicode character.

size_t Xfc::String::index size_t  char_pos  )  const
 

Converts the character offset char_pos to a integer byte index.

Parameters:
char_pos The character offset.
Returns:
The integer byte index corresponding to char_pos.

size_t Xfc::String::length  )  const
 

Returns the length of the string in characters.

For a string containing ASCII characters, the length of the string will be the same as the size of the string. For other UTF-8 characters it will be less.

String Xfc::String::normalize GNormalizeMode  mode,
size_t  n_bytes = npos
 

Converts a string into canonical form.

Parameters:
mode The type of normalization to perform.
n_bytes The length in bytes, or npos for the entire string.
Returns:
A new string, that is the normalized form of the string.
Converts a string into canonical form, standardizing such issues as whether a character with an accent is represented as a base character and combining accent or as a single precomposed character. You should generally call normalize() before comparing two Unicode strings. The normalization mode G_NORMALIZE_DEFAULT only standardizes differences that do not affect the text content, such as the above-mentioned accent representation. G_NORMALIZE_ALL also standardizes the "compatibility" characters in Unicode, such as SUPERSCRIPT THREE to the standard forms (in this case DIGIT THREE). Formatting information may be lost but for most text operations such characters should be considered the same. For example, collate() normalizes with G_NORMALIZE_ALL as its first step. G_NORMALIZE_DEFAULT_COMPOSE and G_NORMALIZE_ALL_COMPOSE are like G_NORMALIZE_DEFAULT and G_NORMALIZE_ALL, but returned a result with composed forms rather than a maximally decomposed form. This is often useful if you intend to convert the string to a legacy encoding or pass it to a system with less capable Unicode handling.

size_t Xfc::String::offset const_pointer  p  )  const
 

Converts a constant pointer to a position within the string to a integer character offset.

Parameters:
p A constant pointer to a byte position within the string.
Returns:
the integer character offset.

G::Unichar Xfc::String::operator[] size_t  char_pos  )  const
 

Calls get_char() to return the character at char_pos as unicode character.

Parameters:
char_pos The integer character offset.
Returns:
The unicode character at char_pos.

const_pointer Xfc::String::pointer size_t  char_pos  )  const
 

Converts an integer character offset to a constant pointer to a position within the string.

Parameters:
char_pos The integer character offset.
Returns:
A constant pointer to a byte position within the string.

size_t Xfc::String::size  )  const [inline]
 

Returns the size of the string in bytes.

For a string containing ASCII characters, the size of the string will be the same as the length of the string. For other UTF-8 characters it will be greater.

const std::string& Xfc::String::str  )  const [inline]
 

Returns a const reference to the internal std::string.

This method can be used to pass a String to a function expecting a standard string.

std::string Xfc::String::to_filename G::Error error = 0  )  const
 

Converts a string from UTF-8 to the encoding used for filenames.

Parameters:
error Location to store the error occuring, or null to ignore errors.
Returns:
The converted string.
Any of the errors in GConvertError may occur.

std::string Xfc::String::to_locale G::Error error = 0  )  const
 

Converts the string from UTF-8 to the encoding used by the C runtime for the current locale.

Parameters:
error Location to store the error occuring, or null to ignore errors.
Returns:
The converted string.
Any of the errors in GConvertError may occur.

bool Xfc::String::validate const_pointer end = 0  )  const
 

Validates UTF-8 encoded text.

Parameters:
end On returning points to the first invalid byte or the end of the string.
Returns:
true if all of str was valid.
Many routines require valid UTF-8 as input; so data read from a file or the network should be checked with validate() before doing anything else with it.

bool Xfc::String::validate size_t &  byte_pos  )  const
 

Validates UTF-8 encoded text.

Parameters:
byte_pos The location to store the byte index of the first invalid byte.
Returns:
true if all of str was valid.
Many routines require valid UTF-8 as input; so data read from a file or the network should be checked with validate() before doing anything else with it.


The documentation for this class was generated from the following file: Xfce Foundation Classes
Copyright © 2004-2005 The XFC Development Team XFC 4.3