Weight Tree — An Experimental Neural Network Weight Storage Method.

A developer proposes an experimental method for storing neural network weights using a tree structure that stores unique values and allows weights to reference them by index, potentially reducing memory usage when many weights share identical values. The approach organizes values hierarchically by similarity, enabling automatic cleanup of unused nodes, but may offer little benefit for models with nearly unique weights or small networks.

In this document, I would like to present an experimental method for storing neural network weights that I came up with. Today, the weights of most neural networks are stored in a format similar to this PSEUDOCODE : { "model architecture": "Dense Layer 1", "data type": "float32", "weights": 0.1542, -0.2341, 0.8912 , 0.0045, 0.5421, -0.1129 } Or simply in a binary format, which makes storing large numbers of weights relatively expensive in terms of SSD/HDD storage. My idea is to store a tree of weight values , while the actual weights only store references indices to values inside that tree. { "model architecture": "Dense Layer 1", "data type": "float32", "weight tree": { "0": { "value": 0.01, "elements": { "0.1": { "value": 0.011, "elements": {} } } }, "1": { "value": 0.02, "elements": {} } }, "weights": 0, 1, 0.1 } Instead of storing floating-point values directly, the weights array stores references to entries inside weight tree . Unlike a regular dictionary, a tree can organize values according to their similarity. For example: 0.01 ├── 0.011 ├── 0.012 └── 0.013 During lookup, the algorithm can quickly locate the closest existing value and decide whether to reuse it or create a new node. Whenever a weight needs to be updated, the algorithm first searches for the closest matching value inside weight tree . - If an identical value already exists, the weight simply points to that existing node. - Otherwise, a new node is created. As a result, millions of weights can reference the same value if they are identical. The idea behind the tree is that new values are created near existing ones . Suppose the tree already contains 0.01 During training, a new value appears: 0.011 It can be inserted as a child node: 0.01 └── 0.011 Later, if another value appears: 0.0115 the tree becomes 0.01 └── 0.011 └── 0.0115 In other words, the tree depth naturally increases as the precision of stored values increases. Searching starts from higher levels and gradually moves toward more precise values. This organization may reduce the amount of duplicated or nearly identical values stored in memory. Example: 0.0 ├── 0.01 │ ├── 0.011 │ ├── 0.012 │ └── 0.013 │ ├── 0.02 │ ├── 0.021 │ └── 0.022 │ └── -0.01 ├── -0.011 └── -0.012 Meanwhile, the model itself stores only references: weights 5, 12, 5, 8, 12, 12, 1, ... where each number represents the identifier of a node inside the tree. Each node may keep track of how many weights reference it. { "value": 0.011, "references": 152384 } When the reference count reaches zero, the node can be removed automatically. This allows unused values to be cleaned up over time. Searching for a node: O h where h is the height of the tree. Retrieving a value by its index is nearly instantaneous: weights i ↓ weight tree id ↓ value - Find the current value. - Search for the new value. - If it exists, update the reference. - Otherwise, create a new tree node. + Reduces memory usage when many weights share identical values. + Allows millions of parameters to reuse the same stored value. + The tree itself can be compressed further. - Provides little or no benefit when almost every float value is unique. - The tree must be updated continuously during training. - Requires additional memory for the tree structure. - Tree lookups are slower than direct array access. This approach is unlikely to provide significant benefits if: - the model stores raw float32 values without quantization; - nearly every weight is unique; - the neural network is relatively small. Future versions of the algorithm could include: - automatically merging very similar values; - limiting the maximum tree depth; - keeping only the most frequently used values; - combining the method with quantization; - periodically rebuilding the tree after several training epochs. Weight Tree is an experimental approach to storing neural network weights by using a shared tree of values instead of storing an independent floating-point number for every parameter. The method is expected to be most effective when many weights share identical or very similar values. In such cases, instead of storing millions of duplicate numbers, only a single value needs to exist in the tree while all corresponding weights reference it. On the other hand, if most weights are unique, the tree will continue to grow and the memory savings may disappear. For this reason, the approach is likely to be most effective when combined with quantization, rounding, or other techniques that reduce the number of unique values. It is also important to note that this method has not been implemented or experimentally evaluated yet. At the moment, Weight Tree exists purely as an experimental concept for exploring alternative ways of storing neural network weights. Overall, Weight Tree should not be viewed as a replacement for existing weight storage formats, but rather as an additional compression technique that may complement modern model optimization methods. Copyright c 2026 dev-er1 Licensed under Creative Commons Attribution 4.0 International CC BY 4.0 .