We’re all used to doing a disk check in Windows XP. It’s easy. Just
double-click on “My Computer”, then select the drive you want to run
the check on. Right-click, Properties, Tools tab, then select “Check
Now…” in the Error-checking section. In almost every instance you’ll be
told that the check will be done upon the next reboot. Easy.
So how does one go about it on Linux? Well… as you may have guessed,
it’s not quite so straightforward. Linux, by default, does actually
have an intelligent disk-checking system already in place. By all
accounts, you generally needn’t worry. But if you have a reason to
believe your disk may be slowly dying, and nothing is reporting in the SMART status of your drive, perhaps it’s worth checking the file system instead.
That’s where File System Check comes in (duh!). Like all Linux
tools, it’s painfully abbreviated to simply “fsck”. Terse, to say the
least. Now the warning:
DO NOT. I REPEAT, DO NOT EVER EVER EVER RUN THIS COMMAND WHILE YOUR
DRIVE IS MOUNTED (I.E. IN USE). I TAKE NO RESPONSIBILITY FOR ANY LOSS
OF DATA THAT YOU MAY CAUSE BY FOLLOWING THESE INSTRUCTIONS.
To unmount your root (/) volume, follow these easy steps:
- Boot from a Live CD. Your root volume will not be mounted by default.
- Open a terminal and type:
# dmesg | grep sda
If you see output relating to your “SCSI” device, then this will identify that your hard disk, in all likelihood, contains your root partition. For example, amongst other output, I see this:
sd 2:0:0:0: [sda] Assuming drive cache: write through
sda: sda1 sda2
sd 2:0:0:0: [sda] Attached SCSI disk
- In the example above, we see that SCSI disk 2 (2:0:0:0:) the Linux kernel registers it as the first logical drive (“sda”) in the system. We can also see it has only 2 partitions, sda1 and sda2. If this is the only physical drive in the machine, we should strongly suspect that it uses one partition as /boot (formatted with ext3) and the other as a Logical Volume containing both root (/) and swap. Furthermore, it’s foregone conculsion that the smallest partition will be /boot and the larger one will contain our swap and / partitions, so let’s proceed with accessing them.
- So, how do we access a “Logical Volume” within an equally mystical
“Volume Group”? Luckily, Linux LVM comes with a plethora of useful
tools to make the job easy.
# /sbin/vgscan
Reading all physical volumes. This may take a while...
Found volume group "VolGroup00" using metadata type lvm2Great. We have identified the volume group. But before we can identify the logical volumes it contains, we need access it.# /sbin/vgchange -a y
2 logical volume(s) in volume group "VolGroup00" now activeHere, the -a flag indicates that we want to change the “active” status of the volume group, and the y means “yes”.# /sbin/lvdisplay
--- Logical volume ---
LV Name /dev/VolGroup00/LogVol00
VG Name VolGroup00
LV UUID DG2WxJ-sKa5-20mg-NtjW-CsPW-t99V-Egqlja
LV Write Access read/write
LV Status available
# open 0
LV Size 7.25 GB
Current LE 232
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:2--- Logical volume ---
LV Name /dev/VolGroup00/LogVol01
VG Name VolGroup00
LV UUID HqKozT-16PQ-HUaT-Yyc7-lMCO-007m-Xcc2c8
LV Write Access read/write
LV Status available
# open 1
LV Size 512.00 MB
Current LE 16
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:3
We can now see two partitions contained within the volume group. The first partition, although small by today’s standards, looks a lot larger than the second. We can also see that each logical volume has a device node (/dev/VolGroup00/LogVol01, for example).
As we want to perform the disk check without the parition being mounted, we do not issue any mount command here. However, if you wanted to double-check that this is the partition to check, mount it and have a quick look around. The following step is only offered to help in this case – skip this if you wish to perform a disk check.
# mkdir /tmp/lv0
For me, the first logical volume (the 7.5GB one) would be the one to test.# mount -t ext3 /dev/VolGroup00/LogVol00 /tmp/lv0
# cd /tmp/lv0
# ls
bin boot dev etc home lib lib64 lost+found media mnt opt proc root sbin selinux srv sys tmp usr var
Ok, that looks like the root partition, so let’s get out of it and unmount it before running the file system check on it.
# cd /
# umount /tmp/lv0 - An alternative to the above steps, if you have already booted into
your main system, is to investigate /etc/fstab to see which is your /
volume. All you do is open a terminal and issue:
# cat /etc/fstab
On my CentOS 5.2 system, I see this:/dev/VolGroup00/LogVol00 / ext3 defaults 1 1
LABEL=/boot1 /boot ext3 defaults 1 2
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
LABEL=SWAP-sdb1 swap swap defaults 0 0So,/dev/VolGroup00/LogVol00
is my root volume.
So, now that that’s out of the way, what next? Well, assuming you
now know which is your root partition, the most sensible thing to do
would be to boot from a Live CD of some distribution (Ubuntu, Fedora, etc) if you haven’t done so, and then perform the disk check from that.
Once in the LiveCD desktop, we’ll need to fire up a Terminal window.
If you know your filesystem type, e.g. if it’s Ext3, which is the
default on the most common distributions, you can run a modified version
of the fsck command specifically for that file system. Here’s what I
run for a thorough disk check:
# fsck.ext3 -C -D -f -P -v /dev/VolGroup00/LogVol00
Alternatively, if your partition structure is slightly older and only contains physical paritions (not Logical Volumes), it may just be a case of finding the partition directly – by checking /etc/fstab on the system when running. In that case, your command may look more like this (when / is unmounted!!):
# fsck.ext3 -C -D -f -P -v /dev/
sda2
Here’s what the flags do:
-C – forces a bad block scan. Although bad blocks are remapped
dynamically by the file system, if the file system or its journal are
corrupt, this may not work correctly.
-D – performs a directory check and optimisation. Doesn’t hurt, and
can speed up directory listings of a large number of files.
-f – forces the check itself to actually run. As mentioned
previously, the file system maintains itself quite well, and if you
don’t force the check, fsck may look at the last check interval and
decide a check is not required.
-P – perform all file system fixes automatically. This is usually a
safe flag, but if your file system is potentially very corrupt, this
may not be advisable. In this situation, contact an expert – or restore
your back-up…
-v – verbose output. See what’s going on.
/dev/VolGroup00/LogVol00 or /dev/sda2 – this is the partition I want to perform the disk check on.
This little guide doesn’t explain how to perform a check on an encrypted logical volume… That one’s coming.