1. 字符集

1.1 百分比编码(URL 编码)字符串

  需要安装percent-encoding库,可通过cargo add percent-encoding 命令安装

[dependencies]
percent-encoding = "2.2.0"

  使用 percent-encoding crate 中的 utf8_percent_encode 函数对输入字符串进行百分比编码(URL 编码)。解码使用 percent_decode 函数。编码集定义哪些字节(除了非 ASCII 字节和控制键之外)需要进行百分比编码(URL 编码),这个集合的选择取决于上下文。例如,urlURL 路径中的 ? 编码,而不对查询字符串中的 ? 编码。编码的返回值是 &str 切片的迭代器,然后聚集为一个字符串 String

use percent_encoding::{percent_decode, utf8_percent_encode, AsciiSet, CONTROLS};
use std::str::Utf8Error;

/// https://url.spec.whatwg.org/#fragment-percent-encode-set
const FRAGMENT: &AsciiSet = &CONTROLS
.add(b' ')
.add(b'"')
.add(b'<')
.add(b'>')
.add(b'`')
.add(b',');

fn main() -> Result<(), Utf8Error> {
let input = "confident, productive systems programming";

let iter = utf8_percent_encode(input, FRAGMENT);
let encoded: String = iter.collect();
assert_eq!(encoded, "confident%2C%20productive%20systems%20programming");

let iter = percent_decode(encoded.as_bytes());
let decoded = iter.decode_utf8()?;
assert_eq!(decoded, "confident, productive systems programming");

Ok(())
}
  • 运行cargo run校验

1.2 将字符串编码为 application/x-www-form-urlencoded

  需要安装url库,可通过cargo add url 命令安装

[dependencies]
url = "2.3.1"

  如下实例使用 form_urlencoded::byte_serialize 将字符串编码为 application/x-www-form-urlencoded 表单语法,随后使用 form_urlencoded::parse 对其进行解码。这两个函数都返回迭代器,然后这些迭代器聚集为 String

use url::form_urlencoded::{byte_serialize, parse};

fn main() {
let urlencoded: String = byte_serialize("What is ❤?".as_bytes()).collect();
assert_eq!(urlencoded, "What+is+%E2%9D%A4%3F");
println!("urlencoded:'{}'", urlencoded);

let decoded: String = parse(urlencoded.as_bytes())
.map(|(key, val)| [key, val].concat())
.collect();
assert_eq!(decoded, "What is ❤?");
println!("decoded:'{}'", decoded);
}
  • 运行cargo run输出
urlencoded:'What+is+%E2%9D%A4%3F'
decoded:'What is ❤?'

1.3 编码和解码十六进制

  需要安装data_encoding库,可通过cargo add data_encoding 命令安装

[dependencies]
data-encoding = "2.3.3"

  data_encoding crate 提供了 HEXUPPER::encode 方法,该方法接受 &[u8] 参数并返回十六进制数据的字符串 String。类似地,data_encodingcrate 提供了 HEXUPPER::decode 方法,该方法接受 &[u8] 参数。如果输入数据被成功解码,则返回 Vec<u8>。下面的实例将 &[u8] 数据转换为等效的十六进制数据,然后将此值与预期值进行比较。

use data_encoding::{HEXUPPER, DecodeError};

fn main() -> Result<(), DecodeError> {
let original = b"The quick brown fox jumps over the lazy dog.";
let expected = "54686520717569636B2062726F776E20666F78206A756D7073206F76\
657220746865206C617A7920646F672E";

let encoded = HEXUPPER.encode(original);
assert_eq!(encoded, expected);

let decoded = HEXUPPER.decode(&encoded.into_bytes())?;
assert_eq!(&decoded[..], &original[..]);

Ok(())
}
  • 运行cargo run校验

1.4 编码和解码 base64

  需要安装base64error-chain库,可通过cargo add base64cargo add error-chain 命令安装

[dependencies]
base64 = "0.21.0"
error-chain = "0.12.4"

  使用 encode 将字节切片编码为 base64 字符串,对 base64 字符串解码使用 decode

use error_chain::error_chain;

use base64::{engine::general_purpose, Engine as _};
use std::str;

error_chain! {
foreign_links {
Base64(base64::DecodeError);
Utf8Error(str::Utf8Error);
}
}

fn main() -> Result<()> {
let hello = b"hello rustaceans";
let encoded = general_purpose::STANDARD_NO_PAD.encode(hello);
let decoded = general_purpose::STANDARD_NO_PAD.decode(&encoded)?;

println!("原始: {}", str::from_utf8(hello)?);
println!("base64 编码后: {}", encoded);
println!("解码到原始: {}", str::from_utf8(&decoded)?);

Ok(())
}
  • 运行cargo run校验
原始: hello rustaceans
base64 编码后: aGVsbG8gcnVzdGFjZWFucw
解码到原始: hello rustaceans

2. CSV 处理

  需要安装csvserde库,可通过cargo add csvcargo add serde --features derive 命令安装

[dependencies]
csv = "1.2.1"
serde = { version = "1.0.160", features = ["derive"] }

2.1 读取 CSV 记录

  将标准的 CSV 记录读入 csv::StringRecord——一种弱类型的数据表示方式,它需要 CSV 中的行数据是有效的 UTF-8 字符编码。另外,csv::ByteRecord 对 UTF-8 不做任何预设。

use csv::Error;

fn main() -> Result<(), Error> {
let csv = "year,make,model,description
1948,Porsche,356,Luxury sports car
1967,Ford,Mustang fastback 1967,American car";

let mut reader = csv::Reader::from_reader(csv.as_bytes());
for record in reader.records() {
let record = record?;
println!(
"在 {}, {} 建立了{}模型, 是 {}.",
&record[0], &record[1], &record[2], &record[3]
);
}

Ok(())
}
  • 运行cargo run输出
在 1948, Porsche 建立了356模型, 是 Luxury sports car.
在 1967, Ford 建立了Mustang fastback 1967模型, 是 American car.

  Serde 将数据反序列化为强类型结构体。具体查阅 csv::Reader::deserialize 方法

use serde::Deserialize;
#[derive(Deserialize)]
struct Record {
year: u16,
make: String,
model: String,
description: String,
}

fn main() -> Result<(), csv::Error> {
let csv = "year,make,model,description
1948,Porsche,356,Luxury sports car
1967,Ford,Mustang fastback 1967,American car";

let mut reader = csv::Reader::from_reader(csv.as_bytes());

for record in reader.deserialize() {
let record: Record = record?;
println!(
"在 {}, {} 建立了{}模型, 是 {}.",
record.year, record.make, record.model, record.description
);
}

Ok(())
}
  • 运行cargo run输出
在 1948, Porsche 建立了356模型, 是 Luxury sports car.
在 1967, Ford 建立了Mustang fastback 1967模型, 是 American car.

2.2 读取有不同分隔符的 CSV 记录

  使用制表(tab)分隔符 delimiter 读取 CSV 记录。

use csv::Error;
use serde::Deserialize;
#[derive(Debug, Deserialize)]
struct Record {
name: String,
place: String,
#[serde(deserialize_with = "csv::invalid_option")]
id: Option<u64>,
}

use csv::ReaderBuilder;

fn main() -> Result<(), Error> {
let data = "name\tplace\tid
Mark\tMelbourne\t46
Ashley\tZurich\t92";

let mut reader = ReaderBuilder::new()
.delimiter(b'\t')
.from_reader(data.as_bytes());
for result in reader.deserialize::<Record>() {
println!("{:?}", result?);
}

Ok(())
}
  • 运行cargo run输出
Record { name: "Mark", place: "Melbourne", id: Some(46) }
Record { name: "Ashley", place: "Zurich", id: Some(92) }

2.3 筛选匹配断言的 CSV 记录

  仅仅 返回 data 中字段(field)与 query 匹配的的行。

use error_chain::error_chain;

use std::io;

error_chain! {
foreign_links {
Io(std::io::Error);
CsvError(csv::Error);
}
}

fn main() -> Result<()> {
let query = "CA";
let data = "\
City,State,Population,Latitude,Longitude
Kenai,AK,7610,60.5544444,-151.2583333
Oakman,AL,,33.7133333,-87.3886111
Sandfort,AL,,32.3380556,-85.2233333
West Hollywood,CA,37031,34.0900000,-118.3608333";

let mut rdr = csv::ReaderBuilder::new().from_reader(data.as_bytes());
let mut wtr = csv::Writer::from_writer(io::stdout());

wtr.write_record(rdr.headers()?)?;

for result in rdr.records() {
let record = result?;
if record.iter().any(|field| field == query) {
wtr.write_record(&record)?;
}
}

wtr.flush()?;
Ok(())
}
  • 运行cargo run输出
City,State,Population,Latitude,Longitude
West Hollywood,CA,37031,34.0900000,-118.3608333

2.4 用 Serde 处理无效的 CSV 数据

  CSV 文件通常包含无效数据。对于这些情形,csv crate 提供了一个自定义的反序列化程序 csv::invalid_option,它自动将无效数据转换为 None 值。

use csv::Error;
use serde::Deserialize;

#[derive(Debug, Deserialize)]
struct Record {
name: String,
place: String,
#[serde(deserialize_with = "csv::invalid_option")]
id: Option<u64>,
}

fn main() -> Result<(), Error> {
let data = "name,place,id
mark,sydney,46.5
ashley,zurich,92
akshat,delhi,37
alisha,colombo,xyz"; //46.5,xyz->None

let mut rdr = csv::Reader::from_reader(data.as_bytes());
for result in rdr.deserialize() {
let record: Record = result?;
println!("{:?}", record);
}

Ok(())
}
  • 运行cargo run输出
Record { name: "mark", place: "sydney", id: None }
Record { name: "ashley", place: "zurich", id: Some(92) }
Record { name: "akshat", place: "delhi", id: Some(37) }
Record { name: "alisha", place: "colombo", id: None }

2.5 将记录序列化为 CSV

  本实例展示了如何序列化 Rust 元组。csv::writer 支持从 Rust 类型到 CSV 记录的自动序列化。write_record 只写入包含字符串数据的简单记录。具有更复杂值(如数字、浮点和选项)的数据使用 serialize 进行序列化。因为 csv::writer 使用内部缓冲区,所以在完成时总是显式刷新 flush

use error_chain::error_chain;

use std::io;

error_chain! {
foreign_links {
CSVError(csv::Error);
IOError(std::io::Error);
}
}

fn main() -> Result<()> {
let mut wtr = csv::Writer::from_writer(io::stdout());

wtr.write_record(&["Name", "Place", "ID"])?;

wtr.serialize(("Mark", "Sydney", 87))?;
wtr.serialize(("Ashley", "Dublin", 32))?;
wtr.serialize((456, "Delhi", "A11"))?;

wtr.flush()?;
Ok(())
}
  • 运行cargo run输出
Name,Place,ID
Mark,Sydney,87
Ashley,Dublin,32
456,Delhi,A11

2.6 转换 CSV 文件的列

  将包含颜色名称和十六进制颜色值的 CSV 文件转换为具有颜色名称和 rgb 颜色值的 CSV 文件。使用 csv crate 读写 csv 文件,使用 serde crate 对行输入字节进行反序列化,对行输出字节进行序列化。详细请参阅 csv::Reader::deserializeserde::Deserialize,以及 std::str::FromStr

use csv::{Reader, Writer};
use error_chain::error_chain;
use serde::{de, Deserialize, Deserializer};
use std::str::FromStr;

error_chain! {
foreign_links {
CsvError(csv::Error);
ParseInt(std::num::ParseIntError);
CsvInnerError(csv::IntoInnerError<Writer<Vec<u8>>>);
IO(std::fmt::Error);
UTF8(std::string::FromUtf8Error);
}
}

#[derive(Debug)]
struct HexColor {
red: u8,
green: u8,
blue: u8,
}

#[derive(Debug, Deserialize)]
struct Row {
color_name: String,
color: HexColor,
}

impl FromStr for HexColor {
type Err = Error;

fn from_str(hex_color: &str) -> std::result::Result<Self, Self::Err> {
let trimmed = hex_color.trim_matches('#');
if trimmed.len() != 6 {
Err("Invalid length of hex string".into())
} else {
Ok(HexColor {
red: u8::from_str_radix(&trimmed[..2], 16)?,
green: u8::from_str_radix(&trimmed[2..4], 16)?,
blue: u8::from_str_radix(&trimmed[4..6], 16)?,
})
}
}
}

impl<'de> Deserialize<'de> for HexColor {
fn deserialize<D>(deserializer: D) -> std::result::Result<Self, D::Error>
where
D: Deserializer<'de>,
{
let s = String::deserialize(deserializer)?;
FromStr::from_str(&s).map_err(de::Error::custom)
}
}

fn main() -> Result<()> {
let data = "color_name,color
红色,#ff0000
绿色,#00ff00
蓝色,#0000FF
长春花色,#ccccff
品红色,#ff00ff"
.to_owned();
let mut out = Writer::from_writer(vec![]);
let mut reader = Reader::from_reader(data.as_bytes());
for result in reader.deserialize::<Row>() {
let res = result?;
out.serialize((
res.color_name,
res.color.red,
res.color.green,
res.color.blue,
))?;
}
let written = String::from_utf8(out.into_inner()?)?;
assert_eq!(Some("品红色,255,0,255"), written.lines().last());
println!("{}", written);
Ok(())
}

  • 运行cargo run输出
红色,255,0,0
绿色,0,255,0
蓝色,0,0,255
长春花色,204,204,255
品红色,255,0,255

3. 结构化数据

3.1 对非结构化 JSON 序列化和反序列化

  需要安装serde_json库,可通过cargo add serde_json 命令安装

[dependencies]
serde_json = "1.0.96"

  serde_json crate 提供了 from_str 函数来解析 JSON 切片 &str。非结构化 JSON 可以被解析为一个通用的 serde_json::Value 类型,该类型能够表示任何有效的 JSON 数据。下面的实例展示如何解析 JSON 切片 &str,期望值被 json! 宏声明。

use serde_json::json;
use serde_json::{Error, Value};

fn main() -> Result<(), Error> {
let j = r#"{
"userid": 103609,
"verified": true,
"access_privileges": [
"用户",
"管理员"
]
}"#;

let parsed: Value = serde_json::from_str(j)?;

let expected = json!({
"userid": 103609,
"verified": true,
"access_privileges": [
"用户",
"管理员"
]
});

assert_eq!(parsed, expected);

Ok(())
}
  • 运行cargo run校验

3.2 反序列化 TOML 配置文件

  需要安装toml库,可通过cargo add toml 命令安装

[dependencies]
toml = "0.7.3"

  将一些 TOML 配置项解析为一个通用的值 toml::Value,该值能够表示任何有效的 TOML 数据。

use toml::{Value, de::Error};

fn main() -> Result<(), Error> {
let toml_content = r#"
[package]
name = "your_package"
version = "0.1.0"
authors = ["You! <you@example.org>"]

[dependencies]
serde = "1.0"
"#;

let package_info: Value = toml::from_str(toml_content)?;

assert_eq!(package_info["dependencies"]["serde"].as_str(), Some("1.0"));
assert_eq!(package_info["package"]["name"].as_str(),
Some("your_package"));

Ok(())
}
  • 运行cargo run校验

  使用 Serde crate 将 TOML 解析为自定义的结构体。

use serde::Deserialize;

use std::collections::HashMap;
use toml::de::Error;

#[derive(Deserialize)]
struct Config {
package: Package,
dependencies: HashMap<String, String>,
}

#[derive(Deserialize)]
struct Package {
name: String,
version: String,
authors: Vec<String>,
}

fn main() -> Result<(), Error> {
let toml_content = r#"
[package]
name = "your_package"
version = "0.1.0"
authors = ["You! <you@example.org>"]

[dependencies]
serde = "1.0"
"#;

let package_info: Config = toml::from_str(toml_content)?;

assert_eq!(package_info.package.name, "your_package");
assert_eq!(package_info.package.version, "0.1.0");
assert_eq!(package_info.package.authors, vec!["You! <you@example.org>"]);
assert_eq!(package_info.dependencies["serde"], "1.0");

Ok(())
}
  • 运行cargo run校验

3.3 以小端模式(低位模式)字节顺序读写整数

  需要安装byteorder库,可通过cargo add byteorder 命令安装

[dependencies]
byteorder = "1.4.3"

  字节序 byteorder 可以反转结构化数据的有效字节。当通过网络接收信息时,这可能是必要的,例如接收到的字节来自另一个系统。


use byteorder::{LittleEndian, ReadBytesExt, WriteBytesExt};
use std::io::Error;

#[derive(Default, PartialEq, Debug)]
struct Payload {
kind: u8,
value: u16,
}

fn main() -> Result<(), Error> {
let original_payload = Payload::default();
let encoded_bytes = encode(&original_payload)?;
let decoded_payload = decode(&encoded_bytes)?;
assert_eq!(original_payload, decoded_payload);
Ok(())
}

fn encode(payload: &Payload) -> Result<Vec<u8>, Error> {
let mut bytes = vec![];
bytes.write_u8(payload.kind)?;
bytes.write_u16::<LittleEndian>(payload.value)?;
Ok(bytes)
}

fn decode(mut bytes: &[u8]) -> Result<Payload, Error> {
let payload = Payload {
kind: bytes.read_u8()?,
value: bytes.read_u16::<LittleEndian>()?,
};
Ok(payload)
}
  • 运行cargo run校验