[oi-dev] [developer] zfs on illumos needs an update to support specific slice with -d
Toomas Soome
tsoome at me.com
Wed Aug 27 08:12:55 UTC 2025
> On 27. Aug 2025, at 01:40, Atiq Rahman <atiqcx at gmail.com> wrote:
>
> More details on the setup to repro,
>
> (It's a spare disk I got from a friend to be able to do some illumos / OI installer related things and to backup some data.)
>
> The disk in question was partitioned using Microsoft's diskpart utility probably few years ago, with a single NTFS partition spanning the entire 2TB drive.
>
> To tear down the old partition table I used latest parted and did
> - wipefs
> - mklabel gpt
> - created one partition for ZFS with type code BF01
>
> I booted into OI live env. created the zfs pool on that partition. Think about it: I created the zfs pool on illumos, yet, every next time I tried to import this pool using OI it would report that the pool is corrupt! To check if it really had corruption I booted into linux, openzfs also reports the pool is corrupt.
>
> But, openzfs imported fine when I specified -d /dev/sda1 all data was intact, no errors. (It would report it's corrupt when I specify -d /dev/sda on openzfs though.)
/dev/sda is all disk in linux world, its the same as /dev/dsk/cXtYd0 or /dev/dsk/cXtYd0p0 in illumos. /dev/sda1 is first partition on sda.
Now, if you think about it a bit; zfs pool on disk has 4MB area before the pool data area, that 4MB does contain pool labels (2 copies, 256KB each) and 3.5MB reserved area for boot program.
On GPT partitioned disk (512B sector size), you have:
sector 0 for PMBR
sector 1 for GPT
sectors 2-33 for GPT partition array, so the first *usable* sector for your partition data is sector 34:
format> ver
Volume name = < >
ascii name = <LSI-MR9361-8i-4.68-557.86GB>
bytes/sector = 512
sectors = 1169920000
accessible sectors = 1169903549
first usable sector = 34
So, if you create GPT on disk, your first 34 sectors are used by GPT, thats 34 * 512 =17 408 bytes. If you create pool on /dev/sda, that means your pool label (that first 4MB area) will overwrite [parts of] GPT.
If you create slice on GPT, lets say starting with first available sector, 34, then your pool labels start from sector 34, and data area starts from sector 34 + 4MB (thats 4194304 / 512 = 8192), thats sector 34 + 8192 = 8 226.
now, if you have pool on your first slice (starting from sector 34), but you are going to use sda to read this pool, then zfs fails to read pool labels, because they are actually located starting from sector 34, not sector 0 and you will get error.
Once more, zpool import -d has nothing to do about if we are able to somehow read corrupted pool, its only purpose is to tell where to look.
Now, there is one very confusing scenario.
Lets assume you have disk with GPT partitioning and you did create pool using all the disk (or you had pool created using all the disk and you did create GPT on it later.
What happens in such cases is that zpool create does not actually clean up the first 4MB, because the label area has reserved blocks, the space under those blocks are not touched at all. So whatever there was, or stored after zpool create, will stay as is. Therefore you have now pool, it does start from absolute sector 0 and has bits of partition table structures “embedded” in it. Maybe some label data areas are corrupted now, maybe not.
So, you have now the pool and partition table sharing the same space, the pool is not imported. Now you have slice /dev/sda1, and you create new pool on sda1. What happens? Your disk still had pool starting from absolute sector 0, and its size is covering all the physical disk. And also you have created second pool in sda1 - so the second pool is written on top of first one, and thus corrupting whatever the data was there.
What happens when you will try to get list of importable pools? If the zpool import does start scanning from absolute sector 0, it will find the information about the first pool, but depending on what was overwritten by second pool, it may report errors or be completely happy…. If you manage to import both pools, then you have very “interesting” times ahead….
It is very important to actually understand what are you doing, which devices are you using and how it does affect your system behavior. Also, if you have experimented with different setups, especially using physical disks versus partitioned disks and want to start over, wipe the disk clean from any remaining data structures of your file systems. Partitioning software does not wipe the unused blocks.
rgds,
toomas
> Till I zero'd out the beginning and end of the partition using dd, on both illumos and linux was showing ntfs as partition type on print output.
>
> Hope it helps understand the case better, <>
> Atiq <>
>
> On Tue, Aug 26, 2025 at 3:21 PM Atiq Rahman <atiqcx at gmail.com <mailto:atiqcx at gmail.com>> wrote:
>> > If one implementation can access the pool just fine, scrub does confirm all is ok, then the pool is ok for that implementation; if you still can not import it to other system, then it means this pool has unsupported [on that second implementation] features. Normally you are told about it loud and clear, however. Without exact messages it is impossible to tell what is actually going on.
>>
>> Neither. In my scenario, both illumos zfs and openzfs interpreted the pool as corrupt due to metadata (till I zerod out and recreated pool) but openzfs (linux) was still able to import it (before zero'ing out and after) because their -d argument has support for specifically importing from a partition. But, in these cases, illumos acts very unfriendly.
>>
>>
>> On Tue, Aug 26, 2025 at 4:59 AM Toomas Soome via illumos-developer <developer at lists.illumos.org <mailto:developer at lists.illumos.org>> wrote:
>>>
>>>
>>>> On 26. Aug 2025, at 13:53, Atiq Rahman <atiqcx at gmail.com <mailto:atiqcx at gmail.com>> wrote:
>>>>
>>>> Hi Toomas,
>>>> With openzfs I can import a pool from a specific partition even if 'zpool import' states pool is corrupt.
>>>> With Illumos zfs I can't. That's what I mean by the subject of this email.
>>>>
>>>
>>> Whether you can actually import the pool or not depends on the corruption. If pool metadata is not readable or is not understandable (feature is not implemented), then you can not access the pool.
>>>
>>> If one implementation can access the pool just fine, scrub does confirm all is ok, then the pool is ok for that implementation; if you still can not import it to other system, then it means this pool has unsupported [on that second implementation] features. Normally you are told about it loud and clear, however. Without exact messages it is impossible to tell what is actually going on.
>>>
>>> rgds,
>>> toomas
>>>
>>>> You have provided nice details, I need to peruse them when I get some free time.
>>>>
>>>> Thanks again,
>>>> Atiq <>
>>>> On Tue, Aug 26, 2025 at 3:37 AM Toomas Soome via illumos-developer <developer at lists.illumos.org <mailto:developer at lists.illumos.org>> wrote:
>>>>>
>>>>>
>>>>>> On 25. Aug 2025, at 09:49, Atiq Rahman <atiqcx at gmail.com <mailto:atiqcx at gmail.com>> wrote:
>>>>>>
>>>>>> > If it's not finding the pool, then that's the problem you need to look at.
>>>>>>
>>>>>> yep, it doesn't see the pool or it would say the pool is corrupt. This repros with a zfs pool on an external disk with one partition(pool is only on that specific partition). Probably illumos thinks the pool is on the whole disk or something while Linux can import/mount it fine (have to provide partition number with -d though).
>>>>>
>>>>>
>>>>> openzfs: zpool import [-D] [-d dir|device]…
>>>>>
>>>>> there the "device" is added for cases when it is desirable to limit the search:
>>>>>
>>>>> commit 522db29275b81c18c2bf53a95efa1aedeb13b428
>>>>> Author: Chunwei Chen <tuxoko at gmail.com <mailto:tuxoko at gmail.com>>
>>>>> Date: Fri Jan 26 10:49:46 2018 -0800
>>>>>
>>>>> zpool import -d to specify device path
>>>>>
>>>>> When we know which devices have the pool we are looking for, sometime
>>>>> it's better if we can directly pass those device paths to zpool import
>>>>> instead of letting it to search through all unrelated stuff, which might
>>>>> take a lot of time if you have hundreds of disks.
>>>>>
>>>>> This patch allows option -d <dev_path> to zpool import. You can have
>>>>> multiple pairs of -d <dev_path>, and zpool import will only search
>>>>> through those devices. For example:
>>>>>
>>>>> zpool import -d /dev/sda -d /dev/sdb
>>>>>
>>>>> Reviewed-by: Tony Hutter <hutter2 at llnl.gov <mailto:hutter2 at llnl.gov>>
>>>>> Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov <mailto:behlendorf1 at llnl.gov>>
>>>>> Signed-off-by: Chunwei Chen <david.chen at nutanix.com <mailto:david.chen at nutanix.com>>
>>>>> Closes #7077
>>>>>
>>>>> From point of view of *finding* the pool, it does not really matter if you search from the directory tree or from specific device.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> I guess illumos code for reading GPT partitions is not up to date. So, it confuses me by reading some MBR records while those are simply not valid on a GPT partition. I fixed that by zeroing out (dd) the first few megabytes and last few megabytes of the partition. then, illumos can read it, wipefs wasn't enough.
>>>>>>
>>>>>
>>>>> How did the MBR records appear on *GPT partition*?! MBR is in absolute sector 0 of the disk and thats it. From GPT point of view, you still have MBR in absolute sector 0, but it must be configured to have one partition and it must cover entire space the MBR partition can cover. It is called Protective MBR and the reason it is there is to prevent MBR only tools to touch the disk (because whole disk is already partitioned for those tools). GPT table itself is located in absolute sector 1, followed by partition array in next sectors (and the backup table is stored at the end of the disk).
>>>>>
>>>>> There is nothing about “not up to date” related to reading GPT partitions. If you want to check the *exact* real data on the disk, you can do:
>>>>>
>>>>> mdb /dev/rdsk/cXtYd0 — replace the device name with your actual disk name, you can use *d0 in case you have GPT or *p0 for any disk.
>>>>> ::load disk_label
>>>>>
>>>>> if you disk is using 4k sector size:
>>>>> > ::sectorsize
>>>>> Current sector size is 512 (0x200)
>>>>> > ::sectorsize 1000
>>>>> > ::sectorsize
>>>>> Current sector size is 4096 (0x1000)
>>>>> >
>>>>>
>>>>> (mdb defaults to use hex).
>>>>>
>>>>> then you can check the partition tables:
>>>>> > ::mbr
>>>>> ..
>>>>> PART TYPE ACTIVE STARTCHS ENDCHS SECTOR NUMSECT
>>>>> 0 EFI_PMBR:0xee 0 0/0/2 1023/255/63 1 1169919999
>>>>> 1 UNUSED:0
>>>>> 2 UNUSED:0
>>>>> 3 UNUSED:0
>>>>> >
>>>>>
>>>>> There you can see Protective MBR.
>>>>>
>>>>> > ::gpt
>>>>> ...
>>>>> PART TYPE STARTLBA ENDLBA ATTR NAME
>>>>> 0 EFI_SYSTEM 256 524543 0 loader
>>>>> 1 EFI_USR 524544 1169903582 0 zfs
>>>>> 2 EFI_UNUSED
>>>>> 3 EFI_UNUSED
>>>>> 4 EFI_UNUSED
>>>>> 5 EFI_UNUSED
>>>>> 6 EFI_UNUSED
>>>>> 7 EFI_UNUSED
>>>>> 8 EFI_RESERVED 1169903583 1169919966 0
>>>>> > ::quit
>>>>>
>>>>> There you can see my boot disk layout with ESP in s0, zpool in s1 and s8 is reserved for internal data. Since illumos does translate GPT partitions to VTOC API (Virtual Table of Contents), this does also limit the number of slices the illumos is willing to show you. Do not put illumos related data past slice 8, unless you want to have challenges in your life. (yes, this sucks but it wont get changed anytime soon).
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Those MBR bytes should not have been looked into (since the disk is using a GPT partition table). So illumos or its zfs is not doing it right. And, while openzfs has -d to facilitate an import in these cases, illumos doesn't have that.
>>>>>>
>>>>>
>>>>> To check for the pool, you start from ‘zpool import’, if it does not show importable pool, there is something wrong. *Assuming* your partition table is good, the next step is to check the data.
>>>>>
>>>>> so you want to verify if the slice has pool on it, lets browse pool labels:
>>>>>
>>>>> # zdb -l /dev/rdsk/c1t0d0s1
>>>>> ------------------------------------
>>>>> LABEL 0
>>>>> ------------------------------------
>>>>> version: 5000
>>>>> name: 'rpool'
>>>>> state: 0
>>>>> txg: 9701129
>>>>> pool_guid: 17731126654892364754
>>>>> errata: 0
>>>>> hostid: 1326551839
>>>>>
>>>>> …
>>>>>
>>>>> labels = 0 1 2 3
>>>>>
>>>>> Above is example of what you would expect to see; in label 0, it indeed did find configuration for pool, you can see version and whatever else is recorded there. Last line is telling us there is identical content in all 4 copies of the pool labels. Thats what we hope to see. If not, there is no recognizable pool in this slice (slice must be at least 64MB).
>>>>>
>>>>> If 'zdb -l' does show the pool label, then you should get some output from ‘zpool import’.
>>>>>
>>>>> rgds,
>>>>> toomas
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Best, <>
>>>>>> Atiq <>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 21, 2025 at 2:03 AM Peter Tribble <peter.tribble at gmail.com <mailto:peter.tribble at gmail.com>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Aug 21, 2025 at 8:45 AM Atiq Rahman <atiqcx at gmail.com <mailto:atiqcx at gmail.com>> wrote:
>>>>>>>> Hi,
>>>>>>>> Say, my data pool is on slice 7 which is /dev/dsk/c3t0d0s6
>>>>>>>>
>>>>>>>> Often times, on external disk, on both illumos and Linux, if I do this or equivalent
>>>>>>>>
>>>>>>>> $ sudo zpool import dpool -d /dev/dsk/c3t0d0
>>>>>>>> it says the pool is corrupt.
>>>>>>>>
>>>>>>>> However, on Linux, I can do this,
>>>>>>>> $ sudo zpool import dpool -d /dev/sda7
>>>>>>>> and Linux obliges nicely.
>>>>>>>>
>>>>>>>> However, illumos doesn't support this syntax,
>>>>>>>> $ sudo zpool import dpool -d /dev/dsk/c3t0d0s6
>>>>>>>> says no such file or directory or something that means it's not supported with -d, (basically expecting a whole disk.)
>>>>>
>>>>> No it is not. in illumos, zpool import -d expects the name of the directory, where your disk device files are, -d has nothing to do about whole disks.
>>>>>
>>>>> zpool import [-D] [-d dir]
>>>>>
>>>>> note its ‘dir’, not ‘device’ like its in OpenZFS. The default is to search from /dev/dsk, where your disk device nodes are located.
>>>>>
>>>>>
>>>>>
>>>>> Secondly, you are using command as: "zpool import dpool -d /dev/dsk/c3t0d0s6” while synopsis is telling you:
>>>>>
>>>>> zpool import [-Dfmt] [-F [-n]] [--rewind-to-checkpoint]
>>>>> [-c cachefile|-d dir] [-o mntopts] [-o property=value]... [-Rroot]
>>>>> pool|id [newpool]
>>>>>
>>>>> So, what is happening there in your case:
>>>>>
>>>>> # zpool import dpool -d /dev/dsk/c3t0d0s6
>>>>> cannot open '/devices/pci at 0,0/pci108e,4852 at 1f,2/cdrom at 0,0:g/': Not a directory
>>>>> cannot import 'dpool': no such pool available
>>>>>
>>>>> notice the error message 'Not a directory’. So lets look inside zpool with truss:
>>>>>
>>>>> resolvepath("/dev/dsk/c3t0d0s6", "/devices/pci at 0,0/pci108e,4852 at 1f,2/cdrom at 0,0:g", 1024) = 46
>>>>> open("/devices/pci at 0,0/pci108e,4852 at 1f,2/cdrom at 0,0:g/", O_RDONLY) Err#20 ENOTDIR
>>>>>
>>>>> There, "/dev/dsk/c3t0d0s6” was resolved to "/devices/pci at 0,0/pci108e,4852 at 1f,2/cdrom at 0,0:g” and zpool was attempting to use it as a directory, and because it is not a directory, we do get respective error:
>>>>>
>>>>> "Err#20 ENOTDIR”.
>>>>>
>>>>> And since I do not have “dpool” anywhere on my disks, I also do get the error from an attempt to import it.
>>>>>
>>>>> rgds,
>>>>> toomas
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>>>
>>>>>>>> You may ask why I am specifying the partition for import. That's coz often times both illumos and linux, not sure for what reason, doesn't see or list the pool on my external SSD when I type "sudo zpool import"
>>>>>>>
>>>>>>> The unqualified import checks every slice/partition on every disk. So it's already looking
>>>>>>> at that partition. If it's not finding the pool, then that's the problem you need to look at.
>>>>>>>
>>>>>>> --
>>>>>>> -Peter Tribble
>>>>>>> http://www.petertribble.co.uk/ - http://ptribble.blogspot.com <http://ptribble.blogspot.com/>_______________________________________________
> oi-dev mailing list
> oi-dev at openindiana.org <mailto:oi-dev at openindiana.org>
> https://openindiana.org/mailman/listinfo/oi-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://openindiana.org/pipermail/oi-dev/attachments/20250827/e8057faa/attachment-0001.html>
More information about the oi-dev
mailing list