Understanding String Slicing in #Rust
Jesús Flores
Senior Software Engineer at Factor Eleven | Video Game DM at Stone Goblin Games
One important thing to remember when slicing strings in #Rust is that slices are based on bytes, not characters. This distinction means that slicing ASCII strings is not the same as slicing multibyte (Unicode) strings. Here's an example of code that doesn't compile due to this difference:
fn main() {
let ascii_string = "foobar";
let multibyte_string = "Espa?a";
let length_in_bytes_ascii = ascii_string.len();
let length_in_bytes_multibyte = multibyte_string.len();
// Slicing strings
let slice_ascii = &ascii_string[..3]; // This works fine
let slice_multibyte = &multibyte_string[..5]; // This will panic at runtime
println!("The length of the ASCII string in bytes is: {}", length_in_bytes_ascii);
println!("The length of the multibyte string in bytes is: {}", length_in_bytes_multibyte);
println!("The ASCII slice is: {}", slice_ascii);
println!("The multibyte slice is: {}", slice_multibyte);
}
In the above code:
When we try to slice multibyte_string using a range that doesn’t align with character boundaries (like &multibyte_string[..5]), Rust will panic because slicing multibyte characters improperly could lead to invalid UTF-8.
Lesson: Always ensure your slices align with valid UTF-8 character boundaries when working with multibyte strings in Rust!