You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
89 lines
3.4 KiB
89 lines
3.4 KiB
From a68dabd45f3591456ecf7e35f6a6077db79f6bc6 Mon Sep 17 00:00:00 2001
|
|
From: "Darrick J. Wong" <djwong@kernel.org>
|
|
Date: Wed, 15 Mar 2023 15:59:35 +0100
|
|
Subject: [PATCH] xfs: fix off-by-one error in xfs_btree_space_to_height
|
|
|
|
Source kernel commit: c0f399ff51495ac8d30367418f4f6292ecd61fbe
|
|
|
|
Lately I've been stress-testing extreme-sized rmap btrees by using the
|
|
(new) xfs_db bmap_inflate command to clone bmbt mappings billions of
|
|
times and then using xfs_repair to build new rmap and refcount btrees.
|
|
This of course is /much/ faster than actually FICLONEing a file billions
|
|
of times.
|
|
|
|
Unfortunately, xfs_repair fails in xfs_btree_bload_compute_geometry with
|
|
EOVERFLOW, which indicates that xfs_mount.m_rmap_maxlevels is not
|
|
sufficiently large for the test scenario. For a 1TB filesystem (~67
|
|
million AG blocks, 4 AGs) the btheight command reports:
|
|
|
|
$ xfs_db -c 'btheight -n 4400801200 -w min rmapbt' /dev/sda
|
|
rmapbt: worst case per 4096-byte block: 84 records (leaf) / 45 keyptrs (node)
|
|
level 0: 4400801200 records, 52390491 blocks
|
|
level 1: 52390491 records, 1164234 blocks
|
|
level 2: 1164234 records, 25872 blocks
|
|
level 3: 25872 records, 575 blocks
|
|
level 4: 575 records, 13 blocks
|
|
level 5: 13 records, 1 block
|
|
6 levels, 53581186 blocks total
|
|
|
|
The AG is sufficiently large to build this rmap btree. Unfortunately,
|
|
m_rmap_maxlevels is 5. Augmenting the loop in the space->height
|
|
function to report height, node blocks, and blocks remaining produces
|
|
this:
|
|
|
|
ht 1 node_blocks 45 blockleft 67108863
|
|
ht 2 node_blocks 2025 blockleft 67108818
|
|
ht 3 node_blocks 91125 blockleft 67106793
|
|
ht 4 node_blocks 4100625 blockleft 67015668
|
|
final height: 5
|
|
|
|
The goal of this function is to compute the maximum height btree that
|
|
can be stored in the given number of ondisk fsblocks. Starting with the
|
|
top level of the tree, each iteration through the loop adds the fanout
|
|
factor of the next level down until we run out of blocks. IOWs, maximum
|
|
height is achieved by using the smallest fanout factor that can apply
|
|
to that level.
|
|
|
|
However, the loop setup is not correct. Top level btree blocks are
|
|
allowed to contain fewer than minrecs items, so the computation is
|
|
incorrect because the first time through the loop it should be using a
|
|
fanout factor of 2. With this corrected, the above becomes:
|
|
|
|
ht 1 node_blocks 2 blockleft 67108863
|
|
ht 2 node_blocks 90 blockleft 67108861
|
|
ht 3 node_blocks 4050 blockleft 67108771
|
|
ht 4 node_blocks 182250 blockleft 67104721
|
|
ht 5 node_blocks 8201250 blockleft 66922471
|
|
final height: 6
|
|
|
|
Fixes: 9ec691205e7d ("xfs: compute the maximum height of the rmap btree when reflink enabled")
|
|
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Signed-off-by: Pavel Reichl <preichl@redhat.com>
|
|
---
|
|
libxfs/xfs_btree.c | 7 ++++++-
|
|
1 file changed, 6 insertions(+), 1 deletion(-)
|
|
|
|
diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c
|
|
index 65d38637..38a3092d 100644
|
|
--- a/libxfs/xfs_btree.c
|
|
+++ b/libxfs/xfs_btree.c
|
|
@@ -4663,7 +4663,12 @@ xfs_btree_space_to_height(
|
|
const unsigned int *limits,
|
|
unsigned long long leaf_blocks)
|
|
{
|
|
- unsigned long long node_blocks = limits[1];
|
|
+ /*
|
|
+ * The root btree block can have fewer than minrecs pointers in it
|
|
+ * because the tree might not be big enough to require that amount of
|
|
+ * fanout. Hence it has a minimum size of 2 pointers, not limits[1].
|
|
+ */
|
|
+ unsigned long long node_blocks = 2;
|
|
unsigned long long blocks_left = leaf_blocks - 1;
|
|
unsigned int height = 1;
|
|
|
|
--
|
|
2.40.0
|
|
|