[OpenIndiana-discuss] Disk Space Disappearing.

Timothy Coalson tsc5yc at mst.edu
Thu Jul 31 00:06:09 UTC 2014


On Wed, Jul 30, 2014 at 12:48 PM, Dormition Skete (Hotmail) <
dormitionskete at hotmail.com> wrote:

>
> On Jul 30, 2014, at 4:49 AM, Jim Klimov <jimklimov at cos.ru> wrote:
>
> >
> > Hello Peter, nice to hear from you again!
> >
> > after a quick look at zfs-list'ings, I see that many of your current
> ZBE's have more and more used space while referenced remains the same. Do
> you have zfs-auto-snap running? To me it seems like your zones keep lots of
> short-lived data (access logs that are recompressed and deleted, email
> queues from local daemons, etc.) and that ends up referenced only from
> snapshots. On the up-side, modern zfs-snap services will actively destroy
> older auto-snaps to free up the threshold percentage of pool size
> (configurable in their smf settings).
> >
> > It might help to enable compression on the zone-roots if you haven't
> done so already, and/or split the zone filesystem into more datasets
> similar in ideology to my split-root setup for global zones (there are
> implementation differences that i haven't published yet iirc). For example,
> if you split off the webserver log directories, you can manage a different
> compression as well as a different auto-snap schedule from the rest of your
> system.
> >
> > HTH, Jim
> > --
> > Typos courtesy of K-9 Mail on my Samsung Android
> >
>
> Jim, it’s nice to hear your friendly voice on this list again!  I haven’t
> seen you on it since I rejoined the list on July 4th.  I was starting to
> wonder if something had happened to you, or if you were just traveling or
> something.
>
> I was up most of the night, so I’m real tired and not real coherent right
> now, but I never have quite understood what that reference column on the
> “zfs list” means.  I have always assumed it was whatever has changed from
> whatever the source (snapshot?) for that pool was.  Am I even close?
>

No, it refers to how many total bytes are referenced by the filesystem -
this includes all the bytes that are also in snapshots.  So, if you make a
filesystem, put a 20GB file in it, and snapshot it, both snapshot and
filesystem will say 20G for referenced.

Something that may be useful here is "zfs list -o space" - this actually
outputs several different kinds of "space used" measures, and one column to
pay attention to is the "USEDSNAP" column - this should help you find
filesystems that have a lot of data that exists only in snapshots.  I have
a pair of scripts that are useful for determining which snapshots are
eating this space, maybe I should host them somewhere.  However, in your
case, it is probably all the snapshots, so not as useful.


> A little background:
>
> After almost a year of procrastinating, I decided last month to take a
> little time to write a script that would let me easily shut down a zone,
> detach it, take snapshots of it, do a “zfs send” on it to a file that I
> could save on another computer to restore from if necessary, reattach the
> zone, and reboot it.  I then ran that script on each of our non-global
> zones, and copied those snapshots to my laptop.
>
> I think that both killed me, and helped me.
>
> Early in the morning of July 3rd or 4th, our main server went down.  Later
> analysis seems to show that I ran it out of disk space.  I’m guessing it is
> from the same problem I noticed last night on our current main server.
>
> *I did have automatic snapshots turned on on that computer.  I have never
> had them turned on the computer we’re currently using as our main server.*
>  I have since turned automatic snapshots off all of our OI servers, and
> have deleted all of the automatic snapshots from them.
>
> So anyway, after that server went down on the 3rd/4th — to make a long
> story short — I took those snapshots that I had saved on my laptop, did a
> “zfs receive” on them on what we are currently using as our main server,
> attached them, booted them, updated the data itself, etc.
>
> So, the source of all of the non-global zones on our current main server
> is those snapshots.
>
> Would the change in data (email, database files, etc.), or most
> especially, the nightly backups that I described in my previous post, be
> likely to have caused those big changes in referenced data?
>
> And, since I raced down to the server room and deleted those source
> snapshots that those zones were created from — which resulted in freeing up
> a lot of disk space — can I reasonably expect that the server should now be
> reasonably stable — that I won’t drop 11 G tonight when our nightly backup
> routines back up all of our data again?  (Deleting the EmailArchive02.tgz
> file, renaming EmailArchive01.tgz to EmailArchive02.tgz, and backing up the
> current data to EmailArchive01.tgz.)
>

The fact that one temporary archive exists at all times means every
snapshot will get at least one of them.  So, assuming your archives rotate
daily, with the default snapshot schedule enabled, you will keep 1 per day
until they are a week old, 1 per week until they are a month old, and 1 per
month until they are a year old.  If these archives are 11GB, I think you
have your culprit.

One easy way around this is without disabling automatic snapshots anywhere
else is to make a new filesystem for the purpose of holding these archives
(and any other things that you rotate daily), and set its zfs property
"com.sun:auto-snapshot" to "false", which will make the automatic snapshots
skip that filesystem.


> Also, if I do a “beadm destroy -s OpenIndiana-151a7-2014-0705A”, will that
> be a safe thing for me to do, and if so, would that free up 10 G of space
> on the drive?
>

I see "10M", not "10G", so I'm going to say no, it won't free that much
space.


> Currently, our system shows:
>
> # beadm list
> BE                           Active Mountpoint Space Policy Created
> OpenIndiana-151a7-2014-0705A -      -          10.9M static 2014-07-05
> 17:13
> OpenIndiana-151a7-2014-0706A NR     /          4.39G static 2014-07-06
> 22:05
> openindiana                  -      -          12.0M static 2014-07-04
> 17:39
>
> I only got a couple of hours of sleep before having to get up for other
> duties, so I’m going to try to grab an hour or two of sleep now; but if
> somebody could shed some light on this mystery for me, I would really
> appreciate it.
>
> This has been a brutal month of server woes.  I want to go back to
> programming!
>
>
> _______________________________________________
> openindiana-discuss mailing list
> openindiana-discuss at openindiana.org
> http://openindiana.org/mailman/listinfo/openindiana-discuss
>


More information about the openindiana-discuss mailing list