|  | regex-syntax | 
|  | ============ | 
|  | This crate provides a robust regular expression parser. | 
|  |  | 
|  | [](https://travis-ci.com/rust-lang/regex) | 
|  | [](https://ci.appveyor.com/project/rust-lang-libs/regex) | 
|  | [](https://crates.io/crates/regex-syntax) | 
|  | [](https://github.com/rust-lang/regex) | 
|  |  | 
|  |  | 
|  | ### Documentation | 
|  |  | 
|  | https://docs.rs/regex-syntax | 
|  |  | 
|  |  | 
|  | ### Overview | 
|  |  | 
|  | There are two primary types exported by this crate: `Ast` and `Hir`. The former | 
|  | is a faithful abstract syntax of a regular expression, and can convert regular | 
|  | expressions back to their concrete syntax while mostly preserving its original | 
|  | form. The latter type is a high level intermediate representation of a regular | 
|  | expression that is amenable to analysis and compilation into byte codes or | 
|  | automata. An `Hir` achieves this by drastically simplifying the syntactic | 
|  | structure of the regular expression. While an `Hir` can be converted back to | 
|  | its equivalent concrete syntax, the result is unlikely to resemble the original | 
|  | concrete syntax that produced the `Hir`. | 
|  |  | 
|  |  | 
|  | ### Example | 
|  |  | 
|  | This example shows how to parse a pattern string into its HIR: | 
|  |  | 
|  | ```rust | 
|  | use regex_syntax::Parser; | 
|  | use regex_syntax::hir::{self, Hir}; | 
|  |  | 
|  | let hir = Parser::new().parse("a|b").unwrap(); | 
|  | assert_eq!(hir, Hir::alternation(vec![ | 
|  | Hir::literal(hir::Literal::Unicode('a')), | 
|  | Hir::literal(hir::Literal::Unicode('b')), | 
|  | ])); | 
|  | ``` | 
|  |  | 
|  |  | 
|  | ### Safety | 
|  |  | 
|  | This crate has no `unsafe` code and sets `forbid(unsafe_code)`. While it's | 
|  | possible this crate could use `unsafe` code in the future, the standard | 
|  | for doing so is extremely high. In general, most code in this crate is not | 
|  | performance critical, since it tends to be dwarfed by the time it takes to | 
|  | compile a regular expression into an automaton. Therefore, there is little need | 
|  | for extreme optimization, and therefore, use of `unsafe`. | 
|  |  | 
|  | The standard for using `unsafe` in this crate is extremely high because this | 
|  | crate is intended to be reasonably safe to use with user supplied regular | 
|  | expressions. Therefore, while their may be bugs in the regex parser itself, | 
|  | they should _never_ result in memory unsafety unless there is either a bug | 
|  | in the compiler or the standard library. (Since `regex-syntax` has zero | 
|  | dependencies.) | 
|  |  | 
|  |  | 
|  | ### Crate features | 
|  |  | 
|  | By default, this crate bundles a fairly large amount of Unicode data tables | 
|  | (a source size of ~750KB). Because of their large size, one can disable some | 
|  | or all of these data tables. If a regular expression attempts to use Unicode | 
|  | data that is not available, then an error will occur when translating the `Ast` | 
|  | to the `Hir`. | 
|  |  | 
|  | The full set of features one can disable are | 
|  | [in the "Crate features" section of the documentation](https://docs.rs/regex-syntax/*/#crate-features). | 
|  |  | 
|  |  | 
|  | ### Testing | 
|  |  | 
|  | Simply running `cargo test` will give you very good coverage. However, because | 
|  | of the large number of features exposed by this crate, a `test` script is | 
|  | included in this directory which will test several feature combinations. This | 
|  | is the same script that is run in CI. | 
|  |  | 
|  |  | 
|  | ### Motivation | 
|  |  | 
|  | The primary purpose of this crate is to provide the parser used by `regex`. | 
|  | Specifically, this crate is treated as an implementation detail of the `regex`, | 
|  | and is primarily developed for the needs of `regex`. | 
|  |  | 
|  | Since this crate is an implementation detail of `regex`, it may experience | 
|  | breaking change releases at a different cadence from `regex`. This is only | 
|  | possible because this crate is _not_ a public dependency of `regex`. | 
|  |  | 
|  | Another consequence of this de-coupling is that there is no direct way to | 
|  | compile a `regex::Regex` from a `regex_syntax::hir::Hir`. Instead, one must | 
|  | first convert the `Hir` to a string (via its `std::fmt::Display`) and then | 
|  | compile that via `Regex::new`. While this does repeat some work, compilation | 
|  | typically takes much longer than parsing. | 
|  |  | 
|  | Stated differently, the coupling between `regex` and `regex-syntax` exists only | 
|  | at the level of the concrete syntax. |