1. 字符集
1.1 百分比编码(URL
编码)字符串
需要安装percent-encoding
库,可通过cargo add percent-encoding
命令安装
[dependencies] percent-encoding = "2.2.0"
使用 percent-encoding
crate 中的 utf8_percent_encode
函数对输入字符串进行百分比编码(URL
编码)。解码使用 percent_decode
函数。编码集定义哪些字节(除了非 ASCII
字节和控制键之外)需要进行百分比编码(URL
编码),这个集合的选择取决于上下文。例如,url
对 URL
路径中的 ?
编码,而不对查询字符串中的 ?
编码。编码的返回值是 &str
切片的迭代器,然后聚集为一个字符串 String
。
use percent_encoding::{percent_decode, utf8_percent_encode, AsciiSet, CONTROLS};use std::str ::Utf8Error;const FRAGMENT: &AsciiSet = &CONTROLS .add (b' ' ) .add (b'"' ) .add (b'<' ) .add (b'>' ) .add (b'`' ) .add (b',' ); fn main () -> Result <(), Utf8Error> { let input = "confident, productive systems programming" ; let iter = utf8_percent_encode (input, FRAGMENT); let encoded : String = iter.collect (); assert_eq! (encoded, "confident%2C%20productive%20systems%20programming" ); let iter = percent_decode (encoded.as_bytes ()); let decoded = iter.decode_utf8 ()?; assert_eq! (decoded, "confident, productive systems programming" ); Ok (()) }
需要安装url
库,可通过cargo add url
命令安装
[dependencies] url = "2.3.1"
如下实例使用 form_urlencoded::byte_serialize
将字符串编码为 application/x-www-form-urlencoded
表单语法,随后使用 form_urlencoded::parse
对其进行解码。这两个函数都返回迭代器,然后这些迭代器聚集为 String
。
use url::form_urlencoded::{byte_serialize, parse};fn main () { let urlencoded : String = byte_serialize ("What is ❤?" .as_bytes ()).collect (); assert_eq! (urlencoded, "What+is+%E2%9D%A4%3F" ); println! ("urlencoded:'{}'" , urlencoded); let decoded : String = parse (urlencoded.as_bytes ()) .map (|(key, val)| [key, val].concat ()) .collect (); assert_eq! (decoded, "What is ❤?" ); println! ("decoded:'{}'" , decoded); }
urlencoded:'What+is+%E2%9D%A4%3F' decoded:'What is ❤?'
1.3 编码和解码十六进制
需要安装data_encoding
库,可通过cargo add data_encoding
命令安装
[dependencies] data-encoding = "2.3.3"
data_encoding
crate 提供了 HEXUPPER::encode
方法,该方法接受 &[u8]
参数并返回十六进制数据的字符串 String
。类似地,data_encoding
crate 提供了 HEXUPPER::decode
方法,该方法接受 &[u8]
参数。如果输入数据被成功解码,则返回 Vec<u8>
。下面的实例将 &[u8]
数据转换为等效的十六进制数据,然后将此值与预期值进行比较。
use data_encoding::{HEXUPPER, DecodeError};fn main () -> Result <(), DecodeError> { let original = b"The quick brown fox jumps over the lazy dog." ; let expected = "54686520717569636B2062726F776E20666F78206A756D7073206F76\ 657220746865206C617A7920646F672E" ; let encoded = HEXUPPER.encode (original); assert_eq! (encoded, expected); let decoded = HEXUPPER.decode (&encoded.into_bytes ())?; assert_eq! (&decoded[..], &original[..]); Ok (()) }
1.4 编码和解码 base64
需要安装base64
、error-chain
库,可通过cargo add base64
、cargo add error-chain
命令安装
[dependencies] base64 = "0.21.0" error-chain = "0.12.4"
使用 encode
将字节切片编码为 base64
字符串,对 base64
字符串解码使用 decode
。
use error_chain::error_chain;use base64::{engine::general_purpose, Engine as _};use std::str ;error_chain! { foreign_links { Base64 (base64::DecodeError); Utf8Error (str ::Utf8Error); } } fn main () -> Result <()> { let hello = b"hello rustaceans" ; let encoded = general_purpose::STANDARD_NO_PAD.encode (hello); let decoded = general_purpose::STANDARD_NO_PAD.decode (&encoded)?; println! ("原始: {}" , str ::from_utf8 (hello)?); println! ("base64 编码后: {}" , encoded); println! ("解码到原始: {}" , str ::from_utf8 (&decoded)?); Ok (()) }
原始: hello rustaceans base64 编码后: aGVsbG8gcnVzdGFjZWFucw解码到原始: hello rustaceans
2. CSV
处理
需要安装csv
、serde
库,可通过cargo add csv
、cargo add serde --features derive
命令安装
[dependencies] csv = "1.2.1" serde = { version = "1.0.160" , features = ["derive" ] }
2.1 读取 CSV
记录
将标准的 CSV
记录读入 csv::StringRecord
——一种弱类型的数据表示方式,它需要 CSV
中的行数据是有效的 UTF-8
字符编码。另外,csv::ByteRecord
对 UTF-8 不做任何预设。
use csv::Error;fn main () -> Result <(), Error> { let csv = "year,make,model,description 1948,Porsche,356,Luxury sports car 1967,Ford,Mustang fastback 1967,American car" ; let mut reader = csv::Reader::from_reader (csv.as_bytes ()); for record in reader.records () { let record = record?; println! ( "在 {}, {} 建立了{}模型, 是 {}." , &record[0 ], &record[1 ], &record[2 ], &record[3 ] ); } Ok (()) }
在 1948, Porsche 建立了356模型, 是 Luxury sports car. 在 1967, Ford 建立了Mustang fastback 1967模型, 是 American car.
Serde
将数据反序列化为强类型结构体。具体查阅 csv::Reader::deserialize
方法
use serde::Deserialize;#[derive(Deserialize)] struct Record { year: u16 , make: String , model: String , description: String , } fn main () -> Result <(), csv::Error> { let csv = "year,make,model,description 1948,Porsche,356,Luxury sports car 1967,Ford,Mustang fastback 1967,American car" ; let mut reader = csv::Reader::from_reader (csv.as_bytes ()); for record in reader.deserialize () { let record : Record = record?; println! ( "在 {}, {} 建立了{}模型, 是 {}." , record.year, record.make, record.model, record.description ); } Ok (()) }
在 1948, Porsche 建立了356模型, 是 Luxury sports car. 在 1967, Ford 建立了Mustang fastback 1967模型, 是 American car.
2.2 读取有不同分隔符的 CSV
记录
使用制表(tab)分隔符 delimiter
读取 CSV
记录。
use csv::Error;use serde::Deserialize;#[derive(Debug, Deserialize)] struct Record { name: String , place: String , #[serde(deserialize_with = "csv::invalid_option" )] id: Option <u64 >, } use csv::ReaderBuilder;fn main () -> Result <(), Error> { let data = "name\tplace\tid Mark\tMelbourne\t46 Ashley\tZurich\t92" ; let mut reader = ReaderBuilder::new () .delimiter (b'\t' ) .from_reader (data.as_bytes ()); for result in reader.deserialize::<Record>() { println! ("{:?}" , result?); } Ok (()) }
Record { name: "Mark" , place: "Melbourne" , id : Some(46) } Record { name: "Ashley" , place: "Zurich" , id : Some(92) }
2.3 筛选匹配断言的 CSV 记录
仅仅 返回 data
中字段(field
)与 query
匹配的的行。
use error_chain::error_chain;use std::io;error_chain! { foreign_links { Io (std::io::Error); CsvError (csv::Error); } } fn main () -> Result <()> { let query = "CA" ; let data = "\ City,State,Population,Latitude,Longitude Kenai,AK,7610,60.5544444,-151.2583333 Oakman,AL,,33.7133333,-87.3886111 Sandfort,AL,,32.3380556,-85.2233333 West Hollywood,CA,37031,34.0900000,-118.3608333" ; let mut rdr = csv::ReaderBuilder::new ().from_reader (data.as_bytes ()); let mut wtr = csv::Writer::from_writer (io::stdout ()); wtr.write_record (rdr.headers ()?)?; for result in rdr.records () { let record = result?; if record.iter ().any (|field| field == query) { wtr.write_record (&record)?; } } wtr.flush ()?; Ok (()) }
City,State,Population,Latitude,Longitude West Hollywood,CA,37031,34.0900000,-118.3608333
2.4 用 Serde
处理无效的 CSV
数据
CSV
文件通常包含无效数据。对于这些情形,csv
crate 提供了一个自定义的反序列化程序 csv::invalid_option
,它自动将无效数据转换为 None
值。
use csv::Error;use serde::Deserialize;#[derive(Debug, Deserialize)] struct Record { name: String , place: String , #[serde(deserialize_with = "csv::invalid_option" )] id: Option <u64 >, } fn main () -> Result <(), Error> { let data = "name,place,id mark,sydney,46.5 ashley,zurich,92 akshat,delhi,37 alisha,colombo,xyz" ; let mut rdr = csv::Reader::from_reader (data.as_bytes ()); for result in rdr.deserialize () { let record : Record = result?; println! ("{:?}" , record); } Ok (()) }
Record { name: "mark" , place: "sydney" , id : None } Record { name: "ashley" , place: "zurich" , id : Some(92) } Record { name: "akshat" , place: "delhi" , id : Some(37) } Record { name: "alisha" , place: "colombo" , id : None }
2.5 将记录序列化为 CSV
本实例展示了如何序列化 Rust 元组。csv::writer
支持从 Rust 类型到 CSV
记录的自动序列化。write_record
只写入包含字符串数据的简单记录。具有更复杂值(如数字、浮点和选项)的数据使用 serialize
进行序列化。因为 csv::writer
使用内部缓冲区,所以在完成时总是显式刷新 flush
。
use error_chain::error_chain;use std::io;error_chain! { foreign_links { CSVError (csv::Error); IOError (std::io::Error); } } fn main () -> Result <()> { let mut wtr = csv::Writer::from_writer (io::stdout ()); wtr.write_record (&["Name" , "Place" , "ID" ])?; wtr.serialize (("Mark" , "Sydney" , 87 ))?; wtr.serialize (("Ashley" , "Dublin" , 32 ))?; wtr.serialize ((456 , "Delhi" , "A11" ))?; wtr.flush ()?; Ok (()) }
Name,Place,ID Mark,Sydney,87 Ashley,Dublin,32 456,Delhi,A11
2.6 转换 CSV
文件的列
将包含颜色名称和十六进制颜色值的 CSV
文件转换为具有颜色名称和 rgb
颜色值的 CSV
文件。使用 csv
crate 读写 csv
文件,使用 serde
crate 对行输入字节进行反序列化,对行输出字节进行序列化。详细请参阅 csv::Reader::deserialize
、serde::Deserialize
,以及 std::str::FromStr
。
use csv::{Reader, Writer};use error_chain::error_chain;use serde::{de, Deserialize, Deserializer};use std::str ::FromStr;error_chain! { foreign_links { CsvError (csv::Error); ParseInt (std::num::ParseIntError); CsvInnerError (csv::IntoInnerError<Writer<Vec <u8 >>>); IO (std::fmt::Error); UTF8 (std::string::FromUtf8Error); } } #[derive(Debug)] struct HexColor { red: u8 , green: u8 , blue: u8 , } #[derive(Debug, Deserialize)] struct Row { color_name: String , color: HexColor, } impl FromStr for HexColor { type Err = Error; fn from_str (hex_color: &str ) -> std::result::Result <Self , Self ::Err > { let trimmed = hex_color.trim_matches ('#' ); if trimmed.len () != 6 { Err ("Invalid length of hex string" .into ()) } else { Ok (HexColor { red: u8 ::from_str_radix (&trimmed[..2 ], 16 )?, green: u8 ::from_str_radix (&trimmed[2 ..4 ], 16 )?, blue: u8 ::from_str_radix (&trimmed[4 ..6 ], 16 )?, }) } } } impl <'de > Deserialize<'de > for HexColor { fn deserialize <D>(deserializer: D) -> std::result::Result <Self , D::Error> where D: Deserializer<'de >, { let s = String ::deserialize (deserializer)?; FromStr::from_str (&s).map_err (de::Error::custom) } } fn main () -> Result <()> { let data = "color_name,color 红色,#ff0000 绿色,#00ff00 蓝色,#0000FF 长春花色,#ccccff 品红色,#ff00ff" .to_owned (); let mut out = Writer::from_writer (vec! []); let mut reader = Reader::from_reader (data.as_bytes ()); for result in reader.deserialize::<Row>() { let res = result?; out.serialize (( res.color_name, res.color.red, res.color.green, res.color.blue, ))?; } let written = String ::from_utf8 (out.into_inner ()?)?; assert_eq! (Some ("品红色,255,0,255" ), written.lines ().last ()); println! ("{}" , written); Ok (()) }
红色,255,0,0 绿色,0,255,0 蓝色,0,0,255 长春花色,204,204,255 品红色,255,0,255
3. 结构化数据
3.1 对非结构化 JSON
序列化和反序列化
需要安装serde_json
库,可通过cargo add serde_json
命令安装
[dependencies] serde_json = "1.0.96"
serde_json
crate 提供了 from_str
函数来解析 JSON
切片 &str
。非结构化 JSON
可以被解析为一个通用的 serde_json::Value
类型,该类型能够表示任何有效的 JSON
数据。下面的实例展示如何解析 JSON
切片 &str
,期望值被 json!
宏声明。
use serde_json::json;use serde_json::{Error, Value};fn main () -> Result <(), Error> { let j = r#"{ "userid": 103609, "verified": true, "access_privileges": [ "用户", "管理员" ] }"# ; let parsed : Value = serde_json::from_str (j)?; let expected = json!({ "userid" : 103609 , "verified" : true , "access_privileges" : [ "用户" , "管理员" ] }); assert_eq! (parsed, expected); Ok (()) }
3.2 反序列化 TOML
配置文件
需要安装toml
库,可通过cargo add toml
命令安装
[dependencies] toml = "0.7.3"
将一些 TOML
配置项解析为一个通用的值 toml::Value
,该值能够表示任何有效的 TOML
数据。
use toml::{Value, de::Error};fn main () -> Result <(), Error> { let toml_content = r#" [package] name = "your_package" version = "0.1.0" authors = ["You! <you@example.org>"] [dependencies] serde = "1.0" "# ; let package_info : Value = toml::from_str (toml_content)?; assert_eq! (package_info["dependencies" ]["serde" ].as_str (), Some ("1.0" )); assert_eq! (package_info["package" ]["name" ].as_str (), Some ("your_package" )); Ok (()) }
使用 Serde
crate 将 TOML
解析为自定义的结构体。
use serde::Deserialize;use std::collections::HashMap;use toml::de::Error;#[derive(Deserialize)] struct Config { package: Package, dependencies: HashMap<String , String >, } #[derive(Deserialize)] struct Package { name: String , version: String , authors: Vec <String >, } fn main () -> Result <(), Error> { let toml_content = r#" [package] name = "your_package" version = "0.1.0" authors = ["You! <you@example.org>"] [dependencies] serde = "1.0" "# ; let package_info : Config = toml::from_str (toml_content)?; assert_eq! (package_info.package.name, "your_package" ); assert_eq! (package_info.package.version, "0.1.0" ); assert_eq! (package_info.package.authors, vec! ["You! <you@example.org>" ]); assert_eq! (package_info.dependencies["serde" ], "1.0" ); Ok (()) }
3.3 以小端模式(低位模式)字节顺序读写整数
需要安装byteorder
库,可通过cargo add byteorder
命令安装
[dependencies] byteorder = "1.4.3"
字节序 byteorder
可以反转结构化数据的有效字节。当通过网络接收信息时,这可能是必要的,例如接收到的字节来自另一个系统。
use byteorder::{LittleEndian, ReadBytesExt, WriteBytesExt};use std::io::Error;#[derive(Default, PartialEq, Debug)] struct Payload { kind: u8 , value: u16 , } fn main () -> Result <(), Error> { let original_payload = Payload::default (); let encoded_bytes = encode (&original_payload)?; let decoded_payload = decode (&encoded_bytes)?; assert_eq! (original_payload, decoded_payload); Ok (()) } fn encode (payload: &Payload) -> Result <Vec <u8 >, Error> { let mut bytes = vec! []; bytes.write_u8 (payload.kind)?; bytes.write_u16::<LittleEndian>(payload.value)?; Ok (bytes) } fn decode (mut bytes: &[u8 ]) -> Result <Payload, Error> { let payload = Payload { kind: bytes.read_u8 ()?, value: bytes.read_u16::<LittleEndian>()?, }; Ok (payload) }