Context
I’m writing this blog post after reading Tom Scheers’s post Writing memory efficient C structs which covers the memory layout of structs and techniques such as bitfields. Today, we will look at another technique which can be used to save memory, and also enable a more natural representation of certain data structures.
Let’s talk about tagged pointers.
Understanding tagged pointers
Tagged pointers are using memory alignment to store additional information.
If on a given architecture int
is 4 bytes, then int
values have to be 4-byte aligned. As the addresses of these values must be a multiple of 4, pointers to such values must necessarily have their two least significant bits set to 0 (if those bits were not 0, the address wouldn’t be a multiple of 4).
// assuming sizeof(int) = 4 on the current architecture
int x; // x must be 4-byte aligned
int *p = &x; // p is pointing to a memory address which is a multiple of 4
// p = 0b........00 : the two least significant bits must be 0 for the address
The idea of tagged pointers is to use these two bits to store information. This will corrupt the pointer, but we know how to retrieve the valid pointer from a corrupted one (we simply set these bits to 0). Looking at the diagram above, if a pointer points to 0x7
, we know that this isn’t a valid address for x
, so we are able to infer that x
actually starts at 0x4
.
Now instead of an int
, you might have a pointer to any data type, whose memory alignment might be larger (and thus allow to use more bits as tags for additional information). In practice, implementing tagged pointers requires to be careful about the specific architecture you are targeting.
Let’s take a real example with the And Inverter Graph (AIG) data structure.
Real-world use case of tagged pointers : AIG
This example is taken from the source code of the ABC project, a software performing various operations to design chips.
The source code presented in this section is available on github.
And Inverter Graphs (AIGs)
And Inverter Graphs (AIGs) are used to represent boolean circuits. They are graphs consisting of inputs, and AND
gates with two incoming wires (called fanins). Moreover, wires can be complemented using NOT
gates.
With AND
and NOT
gates, we can recreate every combinational logic we want. For example:
And the corresponding AIG will be (x
and y
are inputs, the unlabeled node is an AND
gate, and black dots on edges represent a NOT
gate):
flowchart TD ONE@{shape: text, label: "OR(x, y)"} --o A((" ")) A --o B(("x")) A --o C(("y"))
AIGs have been studied extensively1, and are used in electronic design automation (EDA) software, that is, software to design electronic chips.
Edges must be complemented, not nodes
This section can be safely skipped. It justifies the need of having complementable edges (i.e. edges that carry NOT
gates rather than NOT
gates as nodes directly2).
Why not simply complement the nodes themselves you may ask? Well, this would have worked if AIGs were trees, but they are directed acyclic graphs (DAG) whereas trees are undirected acyclic graphs3. Intuitively, in a tree, nodes have exactly one parent (except the root which has no parent), whereas in an AIG, nodes might have multiple parents.
For example, in the following construct (called a miter), node i1
is used by a3
(uncomplemented edge) and a4
(complemented edge).
flowchart TD ONE@{shape: text, label: output} --- O O((XOR)) --o|output 1| A((" ")) A --- B(("a3")) A --o C(("a4")) B --o D((false)) B --- E((i1)) C --o E C --- F((i2)) O -..-o|output 2| E
So, it’s really the edges that might be complemented, not the nodes themselves.
Self-promotion (click to unfold)
Want to get a better understanding on what’s going on here and why is it used for? Take a look at the README of the AIG library I’m building in Rust. I’ll maybe do a blog post on functional equivalence checking later, but if you are interested the docs of the library should hopefully provide some explanations.
A naive implementation without tagged pointers
In this first implementation, an AigEdge
struct was created to carry the complement of an edge.
typedef enum AigNodeType_ AigNodeType;
typedef struct AigEdge_ AigEdge;
typedef struct AigNode_ AigNode;
enum AigNodeType_ {
False,
Input,
And
};
struct AigEdge_ {
AigNode *node;
unsigned int complement : 1;
};
struct AigNode_ {
unsigned int id;
AigNodeType type : 2;
AigEdge *fanin0;
AigEdge *fanin1;
};
unsigned int is_complemented(AigEdge *p) {
return p->complement;
}
The complement is just a boolean, so it is possible to store it in only one bit, and store that bit directly in the fanin pointer instead.
Representing AIG with tagged pointers
Let’s use the least significant bit of the pointers to AigNode
to store the value of complement
.
#include <inttypes.h>
typedef enum AigNodeType_ AigNodeType;
typedef struct AigNode_ AigNode;
enum AigNodeType_ {
False,
Input,
And
};
struct AigNode_ {
unsigned int id;
AigNodeType type : 2;
AigNode *fanin0;
AigNode *fanin1;
};
// Use this to access the valid underlying pointer (with LSB set to 0)
static inline AigNode *regular_node(AigNode *p) {
return (AigNode *)((uint64_t)p & ~1);
}
// Use this to create a complemented edge (with LSB set to 1)
static inline AigNode *complemented_node(AigNode *p) {
return (AigNode *)((uint64_t)p | 1);
}
static inline unsigned int is_complemented(AigNode *p) {
// Just checking for the least significant bit
return (unsigned int)((uint64_t)p & 1);
}
This is actually (a very simplified version of) the implementation provided by ABC.
Now, AigEdge
has completely disappeared, removing one layer of indirection. We just need to be careful when using the fanins, as they might be invalid pointers. For example, to access the id
of a fanin:
unsigned int get_fanin0_id(AigNode *p) {
// Do not do this: p->fanin0 might not be a valid pointer.
// return p->fanin0->id;
// Access the regular pointer instead:
return regular_node(p->fanin0)->id;
}
Note that we need to perform some casts to avoid the compiler yelling at us. Once again, these casts are architecture-dependent.
Digression on representable states
One big problem with both of the codes shown above is that they allow representation of AIGs in an invalid state :
AND
nodes are expected to have two fanins, but that might not be the case- inputs are not supposed to have any fanin, but that might not be the case
- a node might have an invalid type
- …
So many things could go wrong.
Rust enum really shines there, allowing to represent (almost) only valid states (not using tagged pointers in the code below):
struct AigEdge {
// Rust forces us to use reference counters,
// which we need to do in the C implementation anyway
node: Rc<RefCell<AigNode>>,
complement: bool,
}
enum AigNode {
False,
Input(u32),
And {
id: u32,
fanin0: AigEdge,
fanin1: AigEdge,
},
}
This way, the integrity of the AIG is statically guaranteed!
(This is a lie, because the integrity of the AIG is more than just some requirements about individual nodes. An AIG must not contain any cycle, which is a property we must check at the AIG scale and not only at each node separately. But there are still less things that could go wrong.)
Footnotes
-
Some references :
- Mishchenko, A., Chatterjee, S., & Brayton, R. (2006, July). DAG-aware AIG rewriting a fresh look at combinational logic synthesis. In Proceedings of the 43rd annual Design Automation Conference (pp. 532-535).
- Bjesse, P., & Boralv, A. (2004, November). DAG-aware circuit compression for formal verification. In IEEE/ACM International Conference on Computer Aided Design, 2004. ICCAD-2004. (pp. 42-49). IEEE.
- Mishchenko, A., Chatterjee, S., Brayton, R., & Een, N. (2006, November). Improvements to combinational equivalence checking. In Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design (pp. 836-843).
-
I mean it would work to have
NOT
gates as nodes in the AIG, but it will require potentially twice as nodes as ifNOT
gates are carried by edges. ↩ -
Trees are undirected connected acyclic graphs. ↩