AVL Heap Library

High-Performance Binary Tree with Configurable Balance Bits

Overview

The AVL Heap Library provides a high-performance, memory-efficient implementation of an AVL tree stored in a heap-allocated array. Unlike traditional pointer-based AVL trees, this implementation uses array indexing for navigation, resulting in better cache locality and reduced memory overhead.

Key Innovation

The AVL Heap requires a small, fixed-size data structure with two balance bits embedded in that structure; in return, it saves 2 64-bit pointers per data item over traditional AVL trees while keeping its data items in sorted order. The smallest data structure (i.e. 1 64-bit word minus the 2 balance bits) should be as fast as a traditional AVL tree. As the data structure grows to more 64-bit words, AVL Heap insertion and deletion slows down since copying/moving up to one data structure per tree level (AVL Heap) is slower than changing one pointer per tree level (AVL tree).

I provided the idea, then Claude.ai's Sonnet 4.5 created this library and almost all of the documentation. I have not reviewed Claude's code yet, so beware -- there may be bugs. A heap can encode different types of trees; I'm guessing that red-black trees, quad trees, etc. can also be encoded as heaps with similar-to-the-AVL-Heap advantages and disadvantages.

🚀

Memory Efficient

Arena-based allocation with on-demand memory commitment. Balance bits embedded in data.

Cache Friendly

Array-based storage provides excellent cache locality compared to pointer-chasing.

🎯

Configurable

Choose which bits store balance information based on your data alignment requirements.

📊

Multi-Word

Support for elements with multiple 64-bit words for complex data structures.

Features

Core Capabilities

Files

Balance Bit Configurations

Configuration Use Case Constraint Usable Range
BALANCE_TOP_BITS Integer keys, counters, IDs key < 262 0 to 4,611,686,018,427,387,903
BALANCE_BOTTOM_BITS Pointers, aligned addresses 4-byte aligned (bottom 2 bits = 0) Any aligned value

API Reference

Data Types

typedef enum {
    BALANCE_TOP_BITS,    /* Use bits 62-63 (MSB) */
    BALANCE_BOTTOM_BITS  /* Use bits 0-1 (LSB) */
} BalanceBitLocation;

typedef struct AVLHeap AVLHeap;

Initialization

AVLHeap* avlh_init(
    const char *name,              /* Descriptive name for debugging */
    size_t initial_capacity,        /* Initial array size */
    size_t element_words,           /* Words per element (min 1) */
    size_t max_memory,              /* Max arena size (0 = 1 GiB) */
    size_t balance_word_idx,        /* Which word has balance bits */
    BalanceBitLocation balance_location  /* TOP or BOTTOM bits */
);

Core Operations

/* Add element */
int avlh_add(AVLHeap *avlh, uint64_t key, const uint64_t *data);

/* Check existence */
int avlh_find(AVLHeap *avlh, uint64_t key);

/* Get data pointer */
uint64_t* avlh_find_element(AVLHeap *avlh, uint64_t key);

/* Delete element */
int avlh_delete(AVLHeap *avlh, uint64_t key);

/* Free memory */
void avlh_free(AVLHeap *avlh);

Traversal & Statistics

void avlh_print_inorder(AVLHeap *avlh);
void avlh_print_preorder(AVLHeap *avlh);
void avlh_print_postorder(AVLHeap *avlh);
void avlh_stats(AVLHeap *avlh);
size_t avlh_get_size(AVLHeap *avlh);

Usage Examples

Example 1: Integer Keys (TOP bits)

/* Create heap for user IDs */
AVLHeap *ids = avlh_init(
    "UserIDs",
    16,                      /* initial capacity */
    1,                       /* 1 word per element */
    0,                       /* default memory limit */
    0,                       /* balance bits in word 0 */
    BALANCE_TOP_BITS         /* use top 2 bits */
);

/* Add some user IDs */
avlh_add(ids, 100, NULL);
avlh_add(ids, 200, NULL);
avlh_add(ids, 150, NULL);

/* Check if ID exists */
if (avlh_find(ids, 150)) {
    printf("User 150 found!\n");
}

/* Clean up */
avlh_free(ids);

Example 2: Key-Value Pairs (TOP bits)

/* Create heap for employee salaries */
AVLHeap *salaries = avlh_init(
    "Salaries",
    16, 2, 0,           /* 2 words: key + salary */
    0, BALANCE_TOP_BITS
);

/* Add employee data */
uint64_t salary1[] = {75000};
uint64_t salary2[] = {120000};

avlh_add(salaries, 1001, salary1);  /* Employee 1001: $75k */
avlh_add(salaries, 1005, salary2);  /* Employee 1005: $120k */

/* Retrieve salary */
uint64_t *sal = avlh_find_element(salaries, 1005);
if (sal) {
    printf("Employee 1005 earns: $%llu\n", sal[0]);
}

avlh_free(salaries);

Example 3: Pointer Addresses (BOTTOM bits)

/* Create heap for memory addresses */
AVLHeap *ptrs = avlh_init(
    "Pointers",
    16, 1, 0,
    0, BALANCE_BOTTOM_BITS  /* use bottom 2 bits */
);

/* Add aligned addresses (must be 4-byte aligned) */
avlh_add(ptrs, 0x1000, NULL);  /* OK: aligned */
avlh_add(ptrs, 0x2000, NULL);  /* OK: aligned */
avlh_add(ptrs, 0x1001, NULL);  /* FAIL: not aligned */

avlh_free(ptrs);
Important Constraints

BALANCE_TOP_BITS: Keys must be less than 262
BALANCE_BOTTOM_BITS: Keys must be 4-byte aligned (bottom 2 bits = 0)

Advanced Configuration

Multi-Word Elements

You can store multiple 64-bit words per element and place the balance bits in any word:

/* Balance in first word */
AVLHeap *h1 = avlh_init("Config1", 16, 3, 0, 0, BALANCE_TOP_BITS);

/* Balance in last word */
AVLHeap *h2 = avlh_init("Config2", 16, 3, 0, 2, BALANCE_TOP_BITS);

/* Balance in middle word with bottom bits */
AVLHeap *h3 = avlh_init("Config3", 16, 3, 0, 1, BALANCE_BOTTOM_BITS);

uint64_t data[] = {1111, 2222};
avlh_add(h1, 100, data);
avlh_add(h2, 200, data);
avlh_add(h3, 0x1000, data);  /* Must be aligned */

Memory Management

The library uses an arena-based memory allocator with on-demand page commitment:

Memory Efficiency

The arena allocator minimizes system calls and fragmentation. Virtual address space is cheap; physical memory is committed only as needed. This allows heaps to grow efficiently without frequent reallocation.

Performance Characteristics

Time Complexity

Operation Average Case Worst Case
Insert O(log n) O(log n)
Delete O(log n) O(log n)
Search O(log n) O(log n)
Traversal O(n) O(n)

Space Complexity

Advantages vs. Traditional AVL Trees

Better Cache Locality

Array-based storage keeps nodes close together, reducing cache misses

Lower Memory Overhead

No pointer storage, balance bits embedded in data words

Simpler Memory Management

Single arena allocation instead of per-node malloc/free

Predictable Performance

No malloc/free during operations, just index arithmetic

Trade-offs

This implementation excels when elements have unused bits for balance storage and when cache locality matters more than worst-case memory usage. For sparse trees with many unused array slots, a traditional pointer-based AVL tree may be more space-efficient.