The STRING
data type stores a string of Unicode characters.
STRING
is not a data type supported by PostgreSQL. For PostgreSQL compatibility, CockroachDB supports additional aliases and STRING
-related types.
Aliases
CockroachDB supports the following alias for STRING
:
TEXT
Related types
For PostgreSQL compatibility, CockroachDB supports the following STRING
-related types and their aliases:
VARCHAR
(and aliasCHARACTER VARYING
)CHAR
(and aliasCHARACTER
)NAME
These types are functionally identical to STRING
.
CockroachDB also supports the single-byte "char"
special character type. As in PostgreSQL, this special type is intended for internal use in system catalogs, and has a storage size of 1 byte. CockroachDB truncates all values of type "char"
to a single character.
Length
To limit the length of a string column, use STRING(n)
, where n
is the maximum number of Unicode code points (normally thought of as "characters") allowed. This applies to all related types as well (e.g., to limit the length of a VARCHAR
type, use VARCHAR(n)
). To reduce performance issues caused by storing very large string values in indexes, Cockroach Labs recommends setting length limits on string-typed columns.
We strongly recommend adding size limits to all indexed columns, which includes columns in primary keys.
Values exceeding 1 MiB can lead to storage layer write amplification and cause significant performance degradation or even crashes due to OOMs (out of memory errors).
To add a size limit using CREATE TABLE
:
CREATE TABLE name (first STRING(100), last STRING(100));
To add a size limit using ALTER TABLE ... ALTER COLUMN
:
SET enable_experimental_alter_column_type_general = true;
ALTER TABLE name ALTER first TYPE STRING(99);
When inserting a STRING
value or a STRING
-related-type value:
- If the value is cast with a length limit (e.g.,
CAST('hello world' AS STRING(5))
), CockroachDB truncates to the limit. This applies toSTRING(n)
and all related types. - If the value exceeds the column's length limit, CockroachDB returns an error. This applies to
STRING(n)
and all related types. - For
STRING(n)
andVARCHAR(n)
/CHARACTER VARYING(n)
types, if the value is under the column's length limit, CockroachDB does not add space padding to the end of the value. For
CHAR(n)
/CHARACTER(n)
types, if the value is under the column's length limit, CockroachDB adds space padding from the end of the value to the length limit.Type Length CHARACTER
,CHARACTER(n)
,CHAR
,CHAR(n)
Fixed-length CHARACTER VARYING(n)
,VARCHAR(n)
,STRING(n)
Variable-length, with a limit TEXT
,VARCHAR
,CHARACTER VARYING
,STRING
Variable-length, with no limit "char"
(special type)1 byte
Syntax
A value of type STRING
can be expressed using a variety of formats.
See string literals for more details.
When printing out a STRING
value in the SQL shell, the shell uses the simple
SQL string literal format if the value doesn't contain special character,
or the escaped format otherwise.
Collations
STRING
values accept collations, which lets you sort strings according to language- and country-specific rules.
Size
The size of a STRING
value is variable, but it's recommended to keep values under 64 kilobytes to ensure performance. Above that threshold, write amplification and other considerations may cause significant performance degradation.
Examples
> CREATE TABLE strings (a STRING PRIMARY KEY, b STRING(4), c TEXT);
> SHOW COLUMNS FROM strings;
column_name | data_type | is_nullable | column_default | generation_expression | indices | is_hidden
--------------+-----------+-------------+----------------+-----------------------+-----------+------------
a | STRING | false | NULL | | {primary} | false
b | STRING(4) | true | NULL | | {primary} | false
c | STRING | true | NULL | | {primary} | false
(3 rows)
> INSERT INTO strings VALUES ('a1b2c3d4', 'e5f6', 'g7h8i9');
> SELECT * FROM strings;
a | b | c
+----------+------+--------+
a1b2c3d4 | e5f6 | g7h8i9
(1 row)
> CREATE TABLE aliases (a STRING PRIMARY KEY, b VARCHAR, c CHAR);
> SHOW COLUMNS FROM aliases;
column_name | data_type | is_nullable | column_default | generation_expression | indices | is_hidden
+-------------+-----------+-------------+----------------+-----------------------+-------------+-----------+
a | STRING | false | NULL | | {primary} | false
b | VARCHAR | true | NULL | | {primary} | false
c | CHAR | true | NULL | | {primary} | false
(3 rows)
Supported casting and conversion
STRING
values can be cast to any of the following data types:
Type | Details |
---|---|
ARRAY |
Requires supported ARRAY string format, e.g., '{1,2,3}' .Note that string literals can be implicitly cast to any supported ARRAY data type except BYTES , ENUM , JSONB , SERIAL , and the spatial data types Box2D , GEOGRAPHY , and GEOMETRY . |
BIT |
Requires supported BIT string format, e.g., '101001' or 'xAB' . |
BOOL |
Requires supported BOOL string format, e.g., 'true' . |
BYTES |
For more details, see here. |
DATE |
Requires supported DATE string format, e.g., '2016-01-25' . |
DECIMAL |
Requires supported DECIMAL string format, e.g., '1.1' . |
FLOAT |
Requires supported FLOAT string format, e.g., '1.1' . |
INET |
Requires supported INET string format, e.g, '192.168.0.1' . |
INT |
Requires supported INT string format, e.g., '10' . |
INTERVAL |
Requires supported INTERVAL string format, e.g., '1h2m3s4ms5us6ns' . |
TIME |
Requires supported TIME string format, e.g., '01:22:12' (microsecond precision). |
TIMESTAMP |
Requires supported TIMESTAMP string format, e.g., '2016-01-25 10:10:10.555555' . |
TSQUERY |
Requires supported TSQUERY string format, e.g., 'Requires & supported & TSQUERY & string & format' .Note that casting a string to a TSQUERY will not normalize the tokens into lexemes. To do so, use to_tsquery() , plainto_tsquery() , or phraseto_tsquery() . |
TSVECTOR |
Requires supported TSVECTOR string format, e.g., 'Requires supported TSVECTOR string format.' .Note that casting a string to a TSVECTOR will not normalize the tokens into lexemes. To do so, use to_tsvector() . |
UUID |
Requires supported UUID string format, e.g., '63616665-6630-3064-6465-616462656562' . |
STRING
vs. BYTES
While both STRING
and BYTES
can appear to have similar behavior in many situations, one should understand their nuance before casting one into the other.
STRING
treats all of its data as characters, or more specifically, Unicode code points. BYTES
treats all of its data as a byte string. This difference in implementation can lead to dramatically different behavior. For example, let's take a complex Unicode character such as ☃ (the snowman emoji):
> SELECT length('☃'::string);
length
+--------+
1
> SELECT length('☃'::bytes);
length
+--------+
3
In this case, LENGTH(string)
measures the number of Unicode code points present in the string, whereas LENGTH(bytes)
measures the number of bytes required to store that value. Each character (or Unicode code point) can be encoded using multiple bytes, hence the difference in output between the two.
Translate literals to STRING
vs. BYTES
A literal entered through a SQL client will be translated into a different value based on the type:
BYTES
gives a special meaning to the pair\x
at the beginning, and translates the rest by substituting pairs of hexadecimal digits to a single byte. For example,\xff
is equivalent to a single byte with the value of 255. For more information, see SQL Constants: String literals with character escapes.STRING
does not give a special meaning to\x
, so all characters are treated as distinct Unicode code points. For example,\xff
is treated as aSTRING
with length 4 (\
,x
,f
, andf
).
Cast hexadecimal digits to BIT
You can cast a STRING
value of hexadecimal digits prefixed by x
or X
to a BIT
value.
For example:
> SELECT 'XAB'::BIT(8)
bit
------------
10101011
(1 row)
Concatenate STRING
values with values of other types
STRING
values can be concatenated with any non-ARRAY
, non-NULL
type, resulting in a STRING
value.
For example:
> SELECT 1 || 'item';
?column?
------------
1item
(1 row)
> SELECT true || 'item';
?column?
------------
titem
(1 row)
Concatenating a STRING
value with a NULL
value results in a NULL
value.
For example:
> SELECT NULL || 'item';
?column?
------------
NULL
(1 row)
Convert STRING
to TIMESTAMP
You can use the parse_timestamp()
function to parse strings in TIMESTAMP
format.
SELECT parse_timestamp ('2022-05-28T10:53:25.160Z');
parse_timestamp
--------------------------
2022-05-28 10:53:25.16
(1 row)
Convert STRING
to TSVECTOR
You can use the to_tsvector()
function to parse strings in TSVECTOR
format. This will normalize the tokens into lexemes, and will add an integer position to each lexeme.
SELECT to_tsvector('How do trees get on the internet?');
to_tsvector
---------------------------------
'get':4 'internet':7 'tree':3
For more information on usage, see Full-Text Search.
Convert STRING
to TSQUERY
You can use the to_tsquery()
, plainto_tsquery()
, and phraseto_tsquery()
functions to parse strings in TSQUERY
format. This will normalize the tokens into lexemes.
When using to_tsquery()
, the string input must be formatted as a TSQUERY
, with operators separating tokens.
SELECT to_tsquery('How & do & trees & get & on & the & internet?');
to_tsquery
-------------------------------
'tree' & 'get' & 'internet'
For more information on usage, see Full-Text Search.