[OpenIndiana-discuss] ZFS; what the manuals don't say ...

George Wilson george.wilson at delphix.com
Tue Oct 23 14:22:21 UTC 2012


Comments inline...

On 10/23/12 8:29 AM, Robin Axelsson wrote:
> Hi,
> I've been using zfs for a while but still there are some questions 
> that have remained unanswered even after reading the documentation so 
> I thought I would ask them here.
>
> I have learned that zfs datasets can be expanded by adding vdevs. Say 
> that you have created say a raidz3 pool named "mypool" with the command
> # zpool create mypool raidz3 disk1 disk2 disk3 ... disk8
>
> you can expand the capacity by adding vdevs to it through the command
>
> # zpool add mypool raidz3 disk9 disk10 ... disk16
>
> The vdev that is added doesn't need to have the same raid/mirror 
> configuration or disk geometry, if I understand correctly. It will 
> merely be dynamically concatenated with the old storage pool. The 
> documentations says that it will be "striped" but it is not so clear 
> what that means if data is already stored in the old vdevs of the pool.
>
> Unanswered questions:
>
> * What determines _where_ the data will be stored on a such a pool? 
> Will it fill up the old vdev(s) before moving on to the new one or 
> will the data be distributed evenly?

The data is written in a round-robin fashion across all the top-level 
vdevs (i.e. the raidz vdevs). So it will get distributed across them as 
you fill up the pool. It does not fill up one vdev before proceeding.

> * If the old pool is almost full, an even distribution will be 
> impossible, unless zpool rearranges/relocates data upon adding the 
> vdev. Is that what will happen upon adding a vdev?

As you write new data it will try to even out the vdevs. In many cases 
this is not possible and you may end up with the majority of the writes 
going to the empty vdevs. There is logic in zfs to avoid certain vdevs 
if we're unable to allocate from them during a given transaction group 
commit. So when vdevs are very full you may find that very little data 
is being written to them.

> * Can the individual vdevs be read independently/separately? If say 
> the newly added vdev faults, will the entire pool be unreadable or 
> will I still be able to access the old data? What if I took a snapshot 
> before adding the new vdev?

If you lose a top-level vdev then you probably won't be able to access 
your old data. If you're lucky you might be able to retrieve some data 
that was not contained on that top-level vdev but given that ZFS stripes 
across all vdevs it means that most of your data could be lost. Losing a 
leaf vdev (i.e. a single disk) within a top-level vdev is a different 
story. If you lose a leaf vdev then raidz will allow you to continue to 
use the disk and pool in a degraded state. You can then spare out the 
failed leaf vdev or replace the disk.
>
> * Can several datasets be mounted to the same mount point, i.e. can 
> multiple "file system"-datasets be mounted so that they (the root of 
> them) are all accessed from exactly the same (POSIX) path and 
> subdirectories with coinciding names will be merged? The purpose of 
> this would be to seamlessly expand storage capacity this way just like 
> when adding vdevs to a pool.
I think you might be confused about datasets and how they are expanded. 
Datasets see all the space within a pool. There is not a one-to-one 
mapping of dataset to pool. So if you want to create 10 datasets and you 
find that you're running out of space then you simply add another 
top-level vdev to your pool and all the dataset see the additional 
space. I pretty certain that doesn't answer your question but maybe it 
helps in other ways. Feel free to ask again.

> * If that's the case how will the data be distributed/allocated over 
> the datasets if I copy a data file to that path?

Data from all datasets are striped across the top-level vdevs. The 
notion of a given dataset only writing to a single raidz device in the 
pool does not exist.

Thanks,
George

>
> Kind regards
> Robin.
>
>
>
> _______________________________________________
> OpenIndiana-discuss mailing list
> OpenIndiana-discuss at openindiana.org
> http://openindiana.org/mailman/listinfo/openindiana-discuss




More information about the OpenIndiana-discuss mailing list