LDAP SDK Character Sets

Back to Encoded ASN.1 Value Manipulation

Character Set Manipulation

The functions described in this section are specific to this implementation.

UTF-8 is the variable-length encoding of Unicode characters used by LDAPv3 implementations. The advantage of UTF-8 is that US-ASCII characters have their same values.


	int ldap_utf8_strcasecmp (
		const unsigned char *s1,
		const unsigned char *s2
	);

	int ldap_utf8_charlen (
		const unsigned char *str
	);

	unsigned char* ldap_utf8_nextchar (
		const unsigned char *str
	);

	int ldap_utf8_isspace (const unsigned char *str);

	int ldap_utf8_toupper (
		unsigned char *dst, 
		const unsigned char *src
	);

	int ldap_utf8_charcpy (
		unsigned char *dst, 
		const unsigned char *src
	);

ldap_utf8_strcasecmp() compares two UTF-8 strings for equality. It ignores the case of US-ASCII letters present in that string. Like strcmp(), the function returns -1,0 or 1.

ldap_utf8_charlen() returns the number of bytes that form the character encoded beginning at str.

ldap_utf8_nextchar() returns a pointer to the beginning of the character following the character starting at str. It can be used to traverse a UTF-8 encoded string character by character.

ldap_utf8_isspace() returns 1 if the character is a UTF-8 encoding of a POSIX space or ISO 10646 special non-visible space character, 0 otherwise.

ldap_utf8_toupper() copies a single character from src to dst. In the process it converts the case of the character to upper case, if it is was a US-ASCII lower case letter.

ldap_utf8_charcpy() copies a single character from src to dst without modifying its case.

The following functions can be used to convert strings returned by LDAPv2 servers, which used the T.61 character set.

	int ldap_charset_t61_to_88591 ( 
		char **bufp, 
		unsigned long *buflenp, 
		int free_input 
	);

	int ldap_charset_88591_to_t61 ( 
		char **bufp, 
		unsigned long *buflenp, 
		int free_input 
	);

ldap_charset_t61_to_88591() and ldap_charset_88591_to_t61() convert a string from one character set to another. Characters which cannot be represented are replaced by a question mark character. The bufp argument should be the address of a pointer to the beginning of the string, and the buflenp should be the address of a long value, the length of the string. bufp and buflenp are changed by the call to point to the newly allocated string, which should be freed using ldap_memfree(). If free_input is 1, the string pointed to by *bufp is freed with ldap_memfree() before the function returns.

The following functions can be used to convert between UTF-8 and US-ASCII, 16-bit Unicode and ISO-8859-1. The return values must be freed using ldap_memfree().

	unsigned char *ldap_charset_unicode_to_utf8 (
		unsigned short *src,
		int sl
	);

	unsigned char *ldap_charset_88591_to_utf8 (
		unsigned char *src
	);	

	unsigned short *ldap_charset_utf8_to_unicode (
		unsigned char *src
	);

	unsigned char *ldap_charset_utf8_to_88591 (
		unsigned char *src
	);

	char *ldap_charset_utf8_to_ascii (
		unsigned char *src
	);

Up to Contents

Forward to Using SSL