Understanding String Slicing in #Rust

One important thing to remember when slicing strings in #Rust is that slices are based on bytes, not characters. This distinction means that slicing ASCII strings is not the same as slicing multibyte (Unicode) strings. Here's an example of code that doesn't compile due to this difference:

fn main() {
    let ascii_string = "foobar";
    let multibyte_string = "Espa?a";

    let length_in_bytes_ascii = ascii_string.len();
    let length_in_bytes_multibyte = multibyte_string.len();

    // Slicing strings
    let slice_ascii = &ascii_string[..3]; // This works fine
    let slice_multibyte = &multibyte_string[..5]; // This will panic at runtime

    println!("The length of the ASCII string in bytes is: {}", length_in_bytes_ascii);
    println!("The length of the multibyte string in bytes is: {}", length_in_bytes_multibyte);

    println!("The ASCII slice is: {}", slice_ascii);
    println!("The multibyte slice is: {}", slice_multibyte);
}
        

In the above code:

  • ascii_string is a simple ASCII string where each character is 1 byte.
  • multibyte_string is a Unicode string, and some characters (like "?") use more than 1 byte.

When we try to slice multibyte_string using a range that doesn’t align with character boundaries (like &multibyte_string[..5]), Rust will panic because slicing multibyte characters improperly could lead to invalid UTF-8.

Lesson: Always ensure your slices align with valid UTF-8 character boundaries when working with multibyte strings in Rust!



要查看或添加评论,请登录

Jesús Flores的更多文章

社区洞察

其他会员也浏览了