| Copyright | (c) Nicklas Botö 2020 |
|---|---|
| License | GPL-3 |
| Maintainer | git@nicbot.xyz |
| Stability | experimental |
| Safe Haskell | Trustworthy |
Zozo
Description
Library for constructing, exploring, and visualizing regular expressions and their
NFA form. Focusing heavily on the Brzozowski derivative
but also Thompson's construction (eventually).
Synopsis
- data Regex
- (<|>) :: Regex -> Regex -> Regex
- (<^>) :: Regex -> Regex -> Regex
- (***) :: Regex -> Regex -> Regex
- neg :: Regex -> Regex
- (\\) :: Regex -> Regex -> Regex
- r :: String -> Regex
- some :: Regex -> Regex
- many :: Regex -> Regex
- mayb :: Regex -> Regex
- alphabet :: String -> Regex
- unit :: Bool -> Regex
- nu :: Regex -> Regex
- derive :: String -> Regex -> Regex
- (^-) :: String -> Regex -> Regex
- rreduce :: String -> Regex -> Regex
- eval :: Regex -> Set String
- printMatching :: Regex -> IO ()
- match :: String -> Regex -> Bool
- (?) :: String -> Regex -> Bool
- alpha :: Regex
- numeric :: Regex
- int :: Regex
- alphaNum :: Regex
- space :: Regex
- spaces :: Regex
- whiteSpace :: Regex
- ascii :: Regex
- between :: Regex -> Regex -> Regex
- sepBy :: Regex -> Regex -> Regex
- (?>) :: Regex -> Regex -> Regex
- (<?) :: Regex -> Regex -> Regex
Language definition
A simple regex lang
Constructors
| Nil | The empty set |
| Eps | The empty string, \( \epsilon \) |
| Sym Char | A symbol as a singleton set |
| Union Regex Regex | Union of language sets \[ \Lambda \vee \Gamma \hspace{10pt} \Lambda,\Gamma \subseteq \Sigma^* \] |
| Inter Regex Regex | Intercept of language sets \[ \Lambda \wedge \Gamma \hspace{10pt} \Lambda,\Gamma \subseteq \Sigma^* \] |
| Comp Regex | Complement of language sets \[ \neg \Gamma \hspace{10pt} \Gamma \subseteq \Sigma^* \] |
| Conc Regex Regex | All possible concatenations |
| Star Regex |
Regex constructors
(\\) :: Regex -> Regex -> Regex infixl 6 Source #
Language set difference \[ \Lambda \backslash \Gamma \hspace{10pt} \Lambda,\Gamma \subseteq \Sigma^* \]
Specialized constructors
Treat string as concatenation of symbol regexes
r"foo" = Sym 'f' *** Sym 'o' *** Sym 'o'
alphabet :: String -> Regex Source #
Fold string over language union
numericChar = alphabet ['0'..'9']
unit :: Bool -> Regex Source #
\[ \mathbb{1}_x = \begin{cases} \epsilon \quad & x = \top \\ \varnothing \quad & \text{otherwise} \\ \end{cases} \]
Auxiliary function to derive where \[ \nu(R) = \begin{cases} \epsilon \quad & \epsilon \in R\\ \varnothing \quad & \text{otherwise} \\ \end{cases} \] Equivalent to
unit . (Eps mem)
Derivatives
derive :: String -> Regex -> Regex Source #
Defining the (Brzozowski) derivative of a language \(L\) with respect to the string \(u \in \Sigma^*\) to be \[ u^{-1} L = \left\{ v : uv \in L \right\} \]
Evaluation
printMatching :: Regex -> IO () Source #
match :: String -> Regex -> Bool Source #
Check if regex matches. Essentially checking if the derivative with respect to a string contains the empty string, \(\epsilon\).
Regex sets
whiteSpace :: Regex Source #
Any whitespace character
Set combinators
between :: Regex -> Regex -> Regex Source #
Regex inside delimiters
>>>"'a'" ? ascii `between` "'"True
sepBy :: Regex -> Regex -> Regex Source #
One or more regexes seperated by another regex
list :: Regex list = "[" <> int `sepBy` "," <> "]"