Data Types
All values in Terbium have a type. The type of a value determines the structure of the value and the operations
and behavior that can be performed on it. For example the values 1
1
and "hello"
"hello"
are both values, but they
have different types. Certain operations that can be performed on a number cannot be performed on a string, and
vice-versa.
There isn't actually any builtin type that vaguely represents any number, rather the proper type for numbers include integers and floating-point numbers, which are explained later.
Introduction to the Type System
Terbium is a statically-typed language. This means that every value has a type and that the type of any value must be known at compile-time. This is in contrast to dynamically-typed languages, where the type of a value may not be known until runtime.
Terbium is also a strongly-typed language. This means that the type of a value is strictly enforced. This is in contrast to weakly-typed languages, where the type of a value may be ignored or coerced. For example, Terbium will forbid you from adding an integer to a string.
Type Inference
While many statically-typed languages require you to explicitly declare the type of a value or variable, Terbium can
perform type-inference. This means that the compiler can automatically determine the type of a value or variable
based on the context in which it is used. For example, the integer-literal 1
1
looks like an integer to the
compiler, and it couldn't be anything else like a string.
All variables have a type
Similar to all values having a type, all variables enforce a type as well. This means that once a variable is declared, it can only be assigned values of the same type. This way, the compiler can ensure that the type of a variable is consistent throughout the program.
With type-inference, you usually do not need to explicitly declare the type of a variable. The compiler can infer the
type of a variable based on the type of the value it is assigned. Nowhere in let x = 1
let x = 1
is the type of x
declared, rather the compiler infers that x
must be an integer because it is assigned an integer.
Type Annotations
If you want to explicitly declare the type of a variable, you can use a type annotation. A type annotation is
a hint to the compiler that the variable should be of a certain type. You can use a type annotation by adding a colon
and the type name after the variable name. For example, let x: int = 1
let x: int = 1
explicitly declares that x
is an
integer.
"let" IDENT [":" TYPE] ["=" EXPR]
Primitive Scalar Types
A scalar type is a type that represents a single value. Terbium provides four kinds of primitive scalar types: integers, floating-point numbers, booleans, characters, and the unit type void.
Integer Types
An integer is a number that can be written without a fractional component. Integers can take up a varying amount of space in memory, and they can also have a sign (positive or negative) or be unsigned (positive only).
There is an integer type for both signed and unsigned integers: int
int
and uint
uint
. The int
int
type
represents a signed integer, and the uint
uint
type represents an unsigned integer.
There are also integer types for different sizes of integers. The size of an integer is the number of bits it takes up in memory. The larger an integer's size, the greater range of values it can represent. For example, an 8-bit unsigned integer can only represent values from 0 to 255, while a 32-bit unsigned integer can represent values from 0 to 4,294,967,295.
Types are provided for all -bit integers, signed and unsigned, where is a power of 2 and :
Size | Signed | Unsigned |
---|---|---|
8 bits | int8 int8 | uint8 uint8 |
16 bits | int16 int16 | uint16 uint16 |
32 bits | int32 int32 | uint32 uint32 |
64 bits | int64 int64 | uint64 uint64 |
128 bits | int128 int128 | uint128 uint128 |
Infer size or default to 32 bits | int int | uint uint |
Pointer size, architecure-dependent | intptr intptr | uintptr uintptr |
Integers can only hold values within a certain range. Values below this range result in integer underflow, and values above this range result in integer overflow.
The range for a signed integer of size bits is to .
The range for an unsigned integer of size bits is to .
The intptr
intptr
and uintptr
uintptr
types are special types that are the same size as a pointer on the current
architecture. Pointer-sized integers are used when working with pointer arithmetic or working with indexing into
a collection. The size of a pointer is 32 bits on 32-bit architectures and 64 bits on 64-bit architectures.
Floating-point Types
A floating-point number is a number that can be written with a fractional component. Floating-point numbers can take up either 32 or 64 bits in memory. The larger the size of a floating-point number, the greater range of values it can represent and the more precise it can be.
There are two floating-point types: float32
float32
and float64
float64
. The float32
float32
type represents a 32-bit
floating-point number, and the float64
float64
type represents a 64-bit floating-point number. The float
float
type
is an alias for float64
float64
.
Floating-point numbers are not precise. This means that they cannot represent every possible number within their
range. This is why conditions like 0.1 + 0.2 == 0.3
0.1 + 0.2 == 0.3
are false. This is why you should not use floating-point
numbers for precision-sensitive calculations.
A workaround for comparing floating-point numbers is to use an epsilon value. An epsilon value is a small number
that acts as a margin of error for floating-point comparisons. For example, the previous condition can be rewritten
as 0.1 + 0.2 - 0.3 < 0.0001
0.1 + 0.2 - 0.3 < 0.0001
. The episilon value used here is 0.0001
0.0001
. A convenience EPSILON
constant
is associated with floating-point types for this purpose (e.g. 0.1 + 0.2 - 0.3 < float.EPSILON
0.1 + 0.2 - 0.3 < float.EPSILON
).
The range for a 32-bit floating-point number is approximately to .
The range for a 64-bit floating-point number is approximately to .
The exact minimum and maximum values for floating-point numbers can be found as the associated constants float.MIN
float.MIN
and float.MAX
float.MAX
.
Boolean Type
The bool
bool
type represents a boolean value. A boolean value can either be true
true
or false
false
. Booleans
are used for conditions and control flow and represent the truth value of a condition, such as whether x is equal to 1.
The main way to use boolean values is through conditionals, such as an if expression. Conditionals will be covered in the Control Flow section.
Character Type
The char
char
type represents a single Unicode character. A character is a single symbol, such as a letter, number,
or punctuation mark. They are represented by a Unicode Scalar Value, which is four bytes in size. Characters are
written as a single character surrounded by quotes, then prefixed with a c
(e.g. c'a'
c'a'
or c'😎'
c'😎'
).
Void Type
The void
void
type represents the absence of a value. It is used to indicate that a function or expression does not
return a value. This is usually referred to as the unit type in other languages. The only value of the void
void
type is void
void
, and it is compatible with no other type.
The concept of void is not like null in other languages. Void is a type itself and is not compatible with any other type. Null in many other languages can be used in place of any type, which can lead to errors. While null can be used to specify the absence of a value that might exist, void is not. This is left up to optional types.
Primitive Compound Types
Compound types are types that are made up of other types. There are three primitive compound types: tuples, fixed-size arrays, and array slices.
Tuples
A tuple is a fixed-size collection of values. It is a way of grouping together values with a variety of types into a single value. Tuples are written as a comma-separated list of values surrounded by parentheses. Its type is written similarly, but with the types of the values instead. For example:
(1, 2, 3) // A tuple of three integers
(1, 2.0, c'a') // A tuple of an integer, a float, and a character
let my_tuple: (int, float, char) = (1, 2.0, c'a') // explicit type annotation
let my_tuple = (1, 2.0, c'a') // type inference
To access individual elements of a tuple, you can use tuple indexing. Tuple indexing is done by writing the name of
the tuple followed by a period and the index of the element you want to access. The index of the first element is 0
,
and the index of the last element is length of tuple - 1
. For example:
let my_tuple = (1, 2.0, 'a')
assert_eq(my_tuple.0, 1);
assert_eq(my_tuple.1, 2.0);
assert_eq(my_tuple.2, c'a');
Fixed-size Arrays
An array can also store a collection of multiple values is with an array. Unlike a tuple, every element of an array must have the same type. An array is written as a comma-separated list of values surrounded by square brackets.
Terbium supports both fixed-size and growable arrays. Growable arrays require the use of a memory allocator which is not considered "primitive", so they will be covered in a later section. We will focus on fixed-size arrays for now.
A fixed-size array is an array whose size is known at compile-time. The type of a fixed-size array is written as
[ELEMENT_TYPE; LENGTH]
, where ELEMENT_TYPE
is the type of each element and LENGTH
is the number of elements in
the array.
For example:
[1, 2, 3] // A fixed-size array of three integers
[1, 2.0, 3] // Error: all elements must have the same type
let my_array: [int; 3] = [1, 2, 3] // explicit type annotation
let my_array = [1, 2, 3] // type inference
You can also initialize an array with a single value repeated a certain number of times. This is done by writing the value, followed by a semicolon, followed by the number of times to repeat the value. For example:
let arr = [0; 5];
assert_eq(arr, [0, 0, 0, 0, 0]);
You can access an element of an array by using the index operator, in which the index of the element is written
inside square brackets after the array. Arrays are zero-indexed, meaning that the first element of an array is at
index 0
, and the last element is at index LENGTH - 1
. Negative indices will count from the end of the array, i.e.
when trying to access a negative index , the index is expanded into . For example:
let my_array = [1, 2, 3];
assert_eq(my_array[0], 1);
// Use variables as indices too:
let index = 1;
assert_eq(my_array[index], 2);
// Negative indices count from the end of the array:
assert_eq(my_array[-1], 3);
Indexing an array is bounds-checked, meaning that if the index is out of bounds, an error will be thrown. For example:
let my_array = [1, 2, 3];
my_array[3] // Error: index out of bounds
This error is usually caught at compile-time, but can also be a runtime error if the index is not known at
compile-time. For this reason, if you are unsure if an index is out of bounds, you can use the get
method instead,
which returns an optional value for the element so that you can handle the invalid index yourself. Optional values
will be covered in the Optional Types section, and methods will be covered in the
Methods section:
let my_array = [1, 2, 3];
assert_eq(my_array.get(3), .none); // We will cover the .none value in the Optional Types section
To change the value of an element in an array, you can use the index assignment operator, which is written as the index of the element followed by an equals sign and the new value. For example:
let mut my_array = [1, 2, 3]; // Add the mut keyword to make the array mutable
my_array[0] = 4;
assert_eq(my_array, [4, 2, 3]);
Array Slices
An array slice is a view into an array. It is a way of referencing a subset of an array without needing to allocate a new growable array. Array slices cannot be created directly, but are created by taking a slice of an array:
let my_array = [1, 2, 3, 4, 5];
let my_slice = my_array[1..4]; // A slice of the array from index 1 to 4 (exclusive)
assert_eq(my_slice, [2, 3, 4]);
The 1..4
1..4
is called a range. A range is a way of specifying a start and end value and is discussed more in the
Range Types section.
The types of array slices are written as [ELEMENT_TYPE]
[ELEMENT_TYPE]
, e.g.:
let my_array = [1, 2, 3, 4, 5];
let my_slice: [int] = my_array[1..4];
Then, you can access and modify elements of the slice just like you would an array:
let mut my_array = [1, 2, 3, 4, 5];
let my_slice = mut my_array[1..4];
assert_eq(my_slice[0], 2);
my_slice[0] = 6;
assert_eq(my_slice[0], 6);
assert_eq(my_array, [1, 6, 3, 4, 5]);
String Slices
The special string
string
type is interally just a [uint8]
[uint8]
with additional UTF-8 validation. Unlike slices,
you cannot modify a string slice.
String-literals are represented as string
string
s in Terbium. For example:
let my_string = "Hello, world!";
assert_eq(my_string[0..1], "H"); // Taking a slice of a string slice returns another string slice
assert_eq(my_string[0], c'H'); // Taking a single character from a string slice returns a character
As noticed, the indexing operation on a string slice is special in that it returns a single character instead of a byte, since a byte could represent only a part of a character but not the whole character.
Range Types
A range is a value that specifies a start and end value. Ranges are used in a variety of places in Terbium, such as slicing arrays, iterating over a range of values, and more.
There are two types of ranges in Terbium: inclusive and exclusive. An inclusive range is written as
START..=END
START..=END
, and an exclusive range is written as START..END
START..END
.
Furthermore, the START
START
and END
END
values can be omitted, in which case the range will either become
half-open if the start value is omitted, or full/fully open if both values are omitted.
The following shows the corresponding mathematical ranges for each type of range:
Range Syntax | Type | Mathematical Range |
---|---|---|
start..end start..end | Range<Start, End> Range<Start, End> | |
start..=end start..=end | RangeInclusive<Start, End> RangeInclusive<Start, End> | |
..end ..end | RangeTo<End> RangeTo<End> | |
..=end ..=end | RangeToInclusive<End> RangeToInclusive<End> | |
start.. start.. | RangeFrom<Start> RangeFrom<Start> | |
.. .. | RangeFull RangeFull |
...where the meaning of and are implementation-defined.
The _
type
The _
type is a special type that can be said to be the inference or auto type. Remember, type annotations are
hints to the compiler, but they do not need to fully specify the type of a value. You can use the _
type to
tell the compiler to infer just that part of the type.
For example, recall tuples, which are a collection of values of different types. Terbium is perfectly capable of inferring the types of the values in a tuple:
let my_tuple = (1, 2, 3); // The compiler infers the type of my_tuple to be (int, int, int)
Or, you can fully specify the type of the tuple:
let my_tuple: (int, int, int) = (1, 2, 3);
But I can also only specify the type of the first element, and let the compiler infer the rest:
let my_tuple: (int64, _, _) = (1, 2, 3); // The compiler infers the type of the tuple to be (int64, int, int)
Type Casting and Type Coercion
Type casting is the process of converting a value from one type to another. Terbium has two types of type casting: explicit and implicit.
Explicit type casting is simply known as type casting and is done using the to
keyword. For example:
let my_int = 1;
let my_float = my_int to float; // Explicitly cast my_int to a float
assert_eq(my_float, 1.0);
Implicit type casting is known as type coercion and is done automatically by the compiler. For example:
let smaller_int: int32 = 1;
let larger_int: int64 = smaller_int; // Implicitly cast smaller_int to an int64
Type coercion is usually done when the compiler can prove that the value will not lose any information when being cast
to the new type. In the example above, the compiler knows that smaller_int
smaller_int
, a 32-bit integer, can be safely
widen (i.e. cast to a larger bit-width) to an int64
int64
without losing any information.