Context

I’m writing this blog post after reading Tom Scheers’s post Writing memory efficient C structs which covers the memory layout of structs and techniques such as bitfields. Today, we will look at another technique which can be used to save memory, and also enable a more natural representation of certain data structures.

Let’s talk about tagged pointers.

Understanding tagged pointers

Tagged pointers are using memory alignment to store additional information.

If on a given architecture int is 4 bytes, then int values have to be 4-byte aligned. As the addresses of these values must be a multiple of 4, pointers to such values must necessarily have their two least significant bits set to 0 (if those bits were not 0, the address wouldn’t be a multiple of 4).

// assuming sizeof(int) = 4 on the current architecture
int x; // x must be 4-byte aligned
int *p = &x; // p is pointing to a memory address which is a multiple of 4
// p = 0b........00 : the two least significant bits must be 0 for the address

The idea of tagged pointers is to use these two bits to store information. This will corrupt the pointer, but we know how to retrieve the valid pointer from a corrupted one (we simply set these bits to 0). Looking at the diagram above, if a pointer points to 0x7, we know that this isn’t a valid address for x, so we are able to infer that x actually starts at 0x4.

Now instead of an int, you might have a pointer to any data type, whose memory alignment might be larger (and thus allow to use more bits as tags for additional information). In practice, implementing tagged pointers requires to be careful about the specific architecture you are targeting.

Let’s take a real example with the And Inverter Graph (AIG) data structure.

Real-world use case of tagged pointers : AIG

This example is taken from the source code of the ABC project, a software performing various operations to design chips.

The source code presented in this section is available on github.

And Inverter Graphs (AIGs)

And Inverter Graphs (AIGs) are used to represent boolean circuits. They are graphs consisting of inputs, and AND gates with two incoming wires (called fanins). Moreover, wires can be complemented using NOT gates.

With AND and NOT gates, we can recreate every combinational logic we want. For example:

OR (x, y) = NOT (AND [NOT (x), NOT (y)])

And the corresponding AIG will be (x and y are inputs, the unlabeled node is an AND gate, and black dots on edges represent a NOT gate):

flowchart TD
	ONE@{shape: text, label: "OR(x, y)"} --o A((" "))
	A --o B(("x"))
	A --o C(("y"))

AIGs have been studied extensively¹, and are used in electronic design automation (EDA) software, that is, software to design electronic chips.

Edges must be complemented, not nodes

This section can be safely skipped. It justifies the need of having complementable edges (i.e. edges that carry NOT gates rather than NOT gates as nodes directly²).

Why not simply complement the nodes themselves you may ask? Well, this would have worked if AIGs were trees, but they are directed acyclic graphs (DAG) whereas trees are undirected acyclic graphs³. Intuitively, in a tree, nodes have exactly one parent (except the root which has no parent), whereas in an AIG, nodes might have multiple parents.

For example, in the following construct (called a miter), node i1 is used by a3 (uncomplemented edge) and a4 (complemented edge).

flowchart TD
	ONE@{shape: text, label: output} --- O
	O((XOR)) --o|output 1| A((" "))
	A --- B(("a3"))
	A --o C(("a4"))
	B --o D((false))
	B --- E((i1))
	C --o E
	C --- F((i2))
	O -..-o|output 2| E

So, it’s really the edges that might be complemented, not the nodes themselves.

Self-promotion (click to unfold)

Want to get a better understanding on what’s going on here and why is it used for? Take a look at the README of the AIG library I’m building in Rust. I’ll maybe do a blog post on functional equivalence checking later, but if you are interested the docs of the library should hopefully provide some explanations.

A naive implementation without tagged pointers

In this first implementation, an AigEdge struct was created to carry the complement of an edge.

typedef enum AigNodeType_ AigNodeType;
typedef struct AigEdge_ AigEdge;
typedef struct AigNode_ AigNode;
 
enum AigNodeType_ {
	False,
	Input,
	And
};
 
struct AigEdge_ {
	AigNode *node;
	unsigned int complement : 1;
};
 
struct AigNode_ {
	unsigned int id;
	AigNodeType type : 2;
	AigEdge *fanin0;
	AigEdge *fanin1;
};
 
unsigned int is_complemented(AigEdge *p) {
	return p->complement;
}

The complement is just a boolean, so it is possible to store it in only one bit, and store that bit directly in the fanin pointer instead.

Representing AIG with tagged pointers

Let’s use the least significant bit of the pointers to AigNode to store the value of complement.

#include <inttypes.h>
 
typedef enum AigNodeType_ AigNodeType;
typedef struct AigNode_ AigNode;
 
enum AigNodeType_ {
	False,
	Input,
	And
};
 
struct AigNode_ {
	unsigned int id;
	AigNodeType type : 2;
	AigNode *fanin0;
	AigNode *fanin1;
};
 
// Use this to access the valid underlying pointer (with LSB set to 0)
static inline AigNode *regular_node(AigNode *p) {
	return (AigNode *)((uint64_t)p & ~1);
}
 
// Use this to create a complemented edge (with LSB set to 1)
static inline AigNode *complemented_node(AigNode *p) {
	return (AigNode *)((uint64_t)p | 1);
}
 
static inline unsigned int is_complemented(AigNode *p) {
	// Just checking for the least significant bit
	return (unsigned int)((uint64_t)p & 1);
}

This is actually (a very simplified version of) the implementation provided by ABC.

Now, AigEdge has completely disappeared, removing one layer of indirection. We just need to be careful when using the fanins, as they might be invalid pointers. For example, to access the id of a fanin:

unsigned int get_fanin0_id(AigNode *p) {
	// Do not do this: p->fanin0 might not be a valid pointer.
	// return p->fanin0->id;
 
	// Access the regular pointer instead:
	return regular_node(p->fanin0)->id;
}

Note that we need to perform some casts to avoid the compiler yelling at us. Once again, these casts are architecture-dependent.

Digression on representable states

One big problem with both of the codes shown above is that they allow representation of AIGs in an invalid state :

AND nodes are expected to have two fanins, but that might not be the case
inputs are not supposed to have any fanin, but that might not be the case
a node might have an invalid type
…

So many things could go wrong.

Rust enum really shines there, allowing to represent (almost) only valid states (not using tagged pointers in the code below):

struct AigEdge {
	// Rust forces us to use reference counters,
	// which we need to do in the C implementation anyway
	node: Rc<RefCell<AigNode>>,
	complement: bool,
}
 
enum AigNode {
	False,
	Input(u32),
	And {
		id: u32,
		fanin0: AigEdge,
		fanin1: AigEdge,
	},
}

This way, the integrity of the AIG is statically guaranteed!

(This is a lie, because the integrity of the AIG is more than just some requirements about individual nodes. An AIG must not contain any cycle, which is a property we must check at the AIG scale and not only at each node separately. But there are still less things that could go wrong.)

Some references :
- Mishchenko, A., Chatterjee, S., & Brayton, R. (2006, July). DAG-aware AIG rewriting a fresh look at combinational logic synthesis. In Proceedings of the 43rd annual Design Automation Conference (pp. 532-535).
- Bjesse, P., & Boralv, A. (2004, November). DAG-aware circuit compression for formal verification. In IEEE/ACM International Conference on Computer Aided Design, 2004. ICCAD-2004. (pp. 42-49). IEEE.
- Mishchenko, A., Chatterjee, S., Brayton, R., & Een, N. (2006, November). Improvements to combinational equivalence checking. In Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design (pp. 836-843).
↩
I mean it would work to have NOT gates as nodes in the AIG, but it will require potentially twice as nodes as if NOT gates are carried by edges. ↩
Trees are undirected connected acyclic graphs. ↩

Raimmy's blog

Posts

Tagged pointers in action

Table of Contents

Understanding tagged pointers

Real-world use case of tagged pointers : AIG

And Inverter Graphs (AIGs)

Edges must be complemented, not nodes

A naive implementation without tagged pointers

Representing AIG with tagged pointers

Digression on representable states

Table of Contents

Raimmy's blog

Posts

Tagged pointers in action

Table of Contents

Understanding tagged pointers

Real-world use case of tagged pointers : AIG

And Inverter Graphs (AIGs)

Edges must be complemented, not nodes

A naive implementation without tagged pointers

Representing AIG with tagged pointers

Digression on representable states

Footnotes

Table of Contents