SparseBitSet

Additional

Language
Java
Version
SparseBitSet-1.0 (May 26, 2014)
Created
Mar 4, 2013
Updated
Apr 16, 2018
Owner
Brett Wooldridge (brettwooldridge)
Contributors
Brett Wooldridge (brettwooldridge)
radai-rosenblatt
incaseoftrouble
BrentDouglas
lfbayer
5
Activity
Badge
Generate
Download
Source code

Show card

SparseBitSet

TL;DR

Basically, if you need to set a large number of bits, or bits at extremely high offsets, you probably want to use this Sparse BitSet. All other alternatives are essentialy off the table; the Java BitSet class is a non-starter. Performance is superior in almost all cases to the standard Java BitSet.

Preface

You know how the internets are; a link that exists today is gone tomorrow. A while ago I had a need for an efficient Sparse BitSet in Java, and found a presentation and code by Dr. Bruce K. Haddon. Going back later, I found the links I had used to find it were dead. Some internet sleuthing later, I found and contacted Dr. Haddon and he was kind enough to send me the presentation again. I have created this project to capture the code for others, as well as the presentation. I can take credit for neither.

Maven

<dependency>
   <groupId>com.zaxxer</groupId>
   <artifactId>SparseBitSet</artifactId>
   <version>1.1</version>
   <scope>compile</scope>
</dependency>

The Problem and Alternatives

The standard Java BitSet is terribly memory inefficient. To store a single bit using BitSet at bit 2 32-1 takes 2 27 32-bit words (2 26 64bit “words”), not counting any Java object overhead.

Using a HashSet of Integers results in (for each bit), 7 32-bit words overhead, or for 64 bits ~448 32-bit words overhead.

Using a HashMap, where the key = bitvalue / 64, and the value is a Long of packed bits, results in (for 64 bits) ~8 32-bit words overhead.

Using a custom hash table, where the key is an int = bitvalue / 64, and the value is a packed long, results in (for 64 bits) ~4 32-bit words overhead.

The Solution: SparseBitSet

Using a virtual-memory like structure, the SparseBitSet overhead is ~0.03 32-bit words overhead per 64 bits.

For a full analysis, read Dr. Haddon's slide stack.