diff --git a/src/util/README.md b/src/util/README.md index dde8fe0a120..d72acf4507f 100644 --- a/src/util/README.md +++ b/src/util/README.md @@ -10,90 +10,17 @@ This section discusses some of the key data-structures used in the CPROVER codebase. -\subsection irept Irept Data Structure +\subsection irept_section Irept Data Structure -There are a large number of kinds of tree structured or tree-like data in -CPROVER. `irept` provides a single, unified representation for all of -these, allowing structure sharing and reference counting of data. As -such `irept` is the basic unit of data in CPROVER. Each `irept` -contains[^1] a basic unit of data (of type `dt`) which contains four -things: +See \ref irept for more information. -* `data`: A string[^2], which is returned when the `id()` function is - used. +\subsection irep_idt_section Irep_idt and Dstringt -* `named_sub`: A map from `irep_namet` (a string) to `irept`. This - is used for named children, i.e. subexpressions, parameters, etc. - -* `comments`: Another map from `irep_namet` to `irept` which is used - for annotations and other ‘non-semantic’ information - -* `sub`: A vector of `irept` which is used to store ordered but - unnamed children. - -The `irept::pretty` function outputs the contents of an `irept` directly -and can be used to understand and debug problems with `irept`s. - -On their own `irept`s do not “mean” anything; they are effectively -generic tree nodes. Their interpretation depends on the contents of -result of the `id` function (the `data`) field. `util/irep_ids.txt` -contains the complete list of `id` values. During the build process it -is used to generate `util/irep_ids.h` which gives constants for each id -(named `ID_`). These can then be used to identify what kind of data -`irept` stores and thus what can be done with it. - -To simplify this process, there are a variety of classes that inherit -from `irept`, roughly corresponding to the ids listed (i.e. `ID_or` -(the string `"or”`) corresponds to the class `or_exprt`). These give -semantically relevant accessor functions for the data; effectively -different APIs for the same underlying data structure. None of these -classes add fields (only methods) and so static casting can be used. The -inheritance graph of the subclasses of `irept` is a useful starting -point for working out how to manipulate data. - -There are three main groups of classes (or APIs); those derived from -`typet`, `codet` and `exprt` respectively. Although all of these inherit -from `irept`, these are the most abstract level that code should handle -data. If code is manipulating plain `irept`s then something is wrong -with the architecture of the code. - -Many of the key descendent of `exprt` are declared in `std_expr.h`. All -expressions have a named subfield / annotation which gives the type of -the expression (slightly simplified from C/C++ as `unsignedbv_typet`, -`signedbv_typet`, `floatbv_typet`, etc.). All type conversions are -explicit with an expression with `id() == ID_typecast` and an ‘interface -class’ named `typecast_exprt`. One key descendent of `exprt` is -`symbol_exprt` which creates `irept` instances with the id of “symbol”. -These are used to represent variables; the name of which can be found -using the `get_identifier` accessor function. - -`codet` inherits from `exprt` and is defined in `std_code.h`. It -represents executable code; statements in C rather than expressions. In -the front-end there are versions of these that hold whole code blocks, -but in goto-programs these have been flattened so that each `irept` -represents one sequence point (almost one line of code / one -semi-colon). The most common descendents of `codet` are `code_assignt` -so a common pattern is to cast the `codet` to an assignment and then -recurse on the expression on either side. - -[^1]: Or references, if reference counted data sharing is enabled. It is - enabled by default; see the `SHARING` macro. - -[^2]: Unless `USE_STD_STRING` is set, this is actually -a `dstring` and thus an integer which is a reference into a string table - -\subsection irep_idt Irep_idt and Dstringt - -Inside `irept`s, strings are stored as `irep_idt`s, or `irep_namet`s for -keys to `named_sub` or `comments`. If `USE_STD_STRING` is set then both -`irep_idt` and `irep_namet` are `typedef`ed to `std::string`, but by default -it is not set and they are `typedef`ed to `dstringt`. `dstringt` has one -field, an unsigned integer which is an index into a static table of strings. -This makes it expensive to create a new string (because you have to look -through the whole table to see if it is already there, and add it if it -isn't) but very cheap to compare strings (just compare the two integers). It -also means that when you have lots of copies of the same string you only have -to store the whole string once, which saves space. +Inside \ref irept, strings are stored as irep_idts, or irep_namets for +keys to named_sub or comments. By default both irep_idt and irep_namet +are typedefed to \ref dstringt, unless USE_STD_STRING is set, in which +case they are typedefed to std::string (this can be used for debugging +purposes). \dot digraph G { diff --git a/src/util/dstring.h b/src/util/dstring.h index e0a98be9b20..7e72f543fd7 100644 --- a/src/util/dstring.h +++ b/src/util/dstring.h @@ -16,8 +16,20 @@ Author: Daniel Kroening, kroening@kroening.com #include "string_container.h" -// Marked final to disable inheritance. -// No virtual destructor, so runtime-polymorphic use would be unsafe. +/// \ref dstringt has one field, an unsigned integer \ref no which is an index +/// into a static table of strings. This makes it expensive to create a new +/// string(because you have to look through the whole table to see if it is +/// already there, and add it if it isn't) but very cheap to compare strings +/// (just compare the two integers). It also means that when you have lots of +/// copies of the same string you only have to store the whole string once, +/// which saves space. +/// +/// `irep_idt` and `irep_namet` are typedef-ed to \ref dstringt in irep.h unless +/// `USE_STD_STRING` is set. +/// +/// +/// Note: Marked final to disable inheritance. No virtual destructor, so +/// runtime-polymorphic use would be unsafe. class dstringt final { public: diff --git a/src/util/irep.h b/src/util/irep.h index 424839b5bc8..e7cea0710fc 100644 --- a/src/util/irep.h +++ b/src/util/irep.h @@ -82,8 +82,77 @@ inline const std::string &name2string(const irep_namet &n) class irept; const irept &get_nil_irep(); -/*! \brief Base class for tree-like data structures with sharing -*/ +/// \brief Base class for tree-like data structures with sharing +/// +/// There are a large number of kinds of tree structured or tree-like data in +/// CPROVER. \ref irept provides a single, unified representation for all of +/// these, allowing structure sharing and reference counting of data. As +/// such \ref irept is the basic unit of data in CPROVER. Each \ref irept +/// contains (or references, if reference counted data sharing is enabled, as +/// it is by default - see the `SHARING` macro) a basic unit of data (of type +/// \ref dt) which contains four things: +/// +/// * \ref irept::dt::data : A string, which is returned when the \ref id() +/// function is used. (Unless `USE_STD_STRING` is set, this is actually a +/// \ref dstringt and thus an integer which is a reference into a string +/// table.) +/// +/// * \ref irept::dt::named_sub : A map from `irep_namet` (a string) to \ref +/// irept. This is used for named children, i.e. subexpressions, parameters, +/// etc. +/// +/// * \ref irept::dt::comments : Another map from `irep_namet` to \ref irept +/// which is used for annotations and other ‘non-semantic’ information. Note +/// that this map is ignored by the default \ref operator==. +/// +/// * \ref irept::dt::sub : A vector of \ref irept which is used to store +/// ordered but unnamed children. +/// +/// The \ref irept::pretty function outputs the explicit tree structure of +/// an \ref irept and can be used to understand and debug problems with +/// `irept`s. +/// +/// On their own `irept`s do not "mean" anything; they are effectively +/// generic tree nodes. Their interpretation depends on the contents of +/// result of the \ref id() function, i.e. the `data` field. `util/irep_ids.def` +/// contains a list of `id` values. During the build process it is used +/// to generate `util/irep_ids.h` which gives constants for each id +/// (named `ID_`). You can also make `irep_idt`s which do not come from +/// `util/irep_ids.def`. An `irep_idt` can then be used to identify what +/// kind of data the \ref irept stores and thus what can be done with it. +/// +/// To simplify this process, there are a variety of classes that inherit +/// from \ref irept, roughly corresponding to the ids listed (i.e. `ID_or` +/// (the string "or”) corresponds to the class \ref or_exprt). These give +/// semantically relevant accessor functions for the data; effectively +/// different APIs for the same underlying data structure. None of these +/// classes add fields (only methods) and so static casting can be used. The +/// inheritance graph of the subclasses of \ref irept is a useful starting +/// point for working out how to manipulate data. +/// +/// There are three main groups of classes (or APIs); those derived from +/// \ref typet, \ref codet and \ref exprt respectively. Although all of these +/// inherit from \ref irept, these are the most abstract level that code should +/// handle data. If code is manipulating plain `irept`s then something is wrong +/// with the architecture of the code. +/// +/// Many of the key descendants of \ref exprt are declared in \ref std_expr.h. +/// All expressions have a named subexpression with ID "type", which gives the +/// type of the expression (slightly simplified from C/C++ as \ref +/// unsignedbv_typet, \ref signedbv_typet, \ref floatbv_typet, etc.). All type +/// conversions are explicit with a \ref typecast_exprt. One key descendant of +/// \ref exprt is \ref symbol_exprt which creates \ref irept instances with ID +/// “symbol”. These are used to represent variables; the name of which can be +/// found using the `get_identifier` accessor function. +/// +/// \ref codet inherits from \ref exprt and is defined in `std_code.h`. It +/// represents executable code; statements in a C-like language rather than +/// expressions. In the front-end there are versions of these that hold +/// whole code blocks, but in goto-programs these have been flattened so +/// that each \ref irept represents one sequence point (almost one line of +/// code / one semi-colon). The most common descendant of \ref codet is +/// \ref code_assignt so a common pattern is to cast the \ref codet to an +/// assignment and then recurse on the expression on either side. class irept { public: