File compression is necessary for several reasons, be it
more efficient storage or transmission of content by your application. You may
not want to develop a full-blown file compression software, but a smaller
version of that. To help with this task for .NET developers, ICSharpCode
provides a .NET assembly called 'SharpZipLib'. This library supports the
ZIP, GZIP, TAR and BZ2 (BZip2) formats. The library is open source, so you can
download the source code and make your optimizations and fixes directly to the
core, or directly lift and plug just the required components into your
application. In the project for this article, we are using the binary
SharpZipLib assembly to develop a tool similar to popular tools like WinZip.
Our current code will process only ZIP files, but adding
support for other formats is as simple as setting a couple of parameters here
and there.
Understanding the project
Our project has six source files in all. SplashScreen, MainWindow,
CommentsSetting and CompressionSetting together handle the UI of the
application. FileSystem and Zipper are classes that do the background work. We
are not delving deep into the code for three of the files; but it suffices to
say that SplashScreen is the startup screen to the application with no
modifications to what VS.NET 2005 writes when you add a form from the 'Splash
Screen' template to the project.
|
The name, version and copyright information for this screen
come directly from the project's properties at runtime. The other two files-CommentsSetting
and CompressionSetting, display one dialog box each to let the user add or
change the ZIP file comment and compression ratio, respectively. FileSystem is a
wrapper around Path, File and Directory classes and add some additional
functionality to those classes. Zipper is our interface with the SharpZipLib
component, a wrapper for its functions and performs exception handling.
Code flow
When the user runs the application, the runtime shows the splash screen (SplashScreen.vb)
and times it out after 5 seconds. Then the MainWindow UI is displayed and the
application waits for user interaction through the menu system there. The user
opens an existing ZIP file or creates a new one to start off other
functionality.
On menu-item selection, the MainWindow code will check the
relevance (and make adjustments if required) before passing it on to the
functions within Zipper. Once the operation completes (success or failure), the
results are shown on the UI. The UI also has a status bar, which is used to
display some progress information. Let's now take up the code in Zipper and
understand it.
Zipper.vb
At the start of this file, you will notice a 'Structure' defined. This
is used by the last function in the file (ListContents, lines 393—451) that
reads a zip file and loads its directory into the FileInZip structure array. The
caller of ListContents will then use this information to write it to the UI.
This class has nine functions and one important public property (ZipError). All
consumers of the Zipper class must examine the value of the ZipError variable to
determine error status.
That there is an error would be indicated by the called
function-those that have defined Boolean returns will return a 'False',
others will return blank values. Typical errors range from incorrectly provided
information for a task to corruption in the ZIP file.
Getting started Before attempting to We have provided the |
You will notice that most of the code in this class plays
around with path strings. This is important because while the program itself is
dealing with absolute paths in MSDOS format ('\'), the ZIP file will store
files with relative paths and UNIX format ('/'). If path information is
improperly passed along, then the resulting output is unusable. Let's take the
actual zipping (lines 45—73) and unzipping (lines 315—355) code and analyze
them. The zipping code is as follows.
zFile = New
ZipOutputStream(File.Create (DestinationFileName))
With zFile
.SetLevel(CompressionLevel)
.SetMethod(ZipOutputStream.DEFLATED)
End With
inFile =
File.OpenRead(SourceFilePath)
ReDim Buf(inFile.Length - 1)
inFile.Read(Buf, 0,
Buf.Length)
zEntry = New ZipEntry(Path.GetFileName(SourceFilePath))
With zEntry
.DateTime = Now
.Size = inFile.Length
End With
inFile.Close()
With oCrc32
.Reset()
.Update(Buf)
zEntry.Crc = .Value
End With
With zFile
.PutNextEntry(zEntry)
.Write(Buf, 0, Buf.Length)
.Finish()
.Close()
End With
The zipping operation is implemented as a Stream object (ZipInputStream).
So, the first line above instantiates this by passing it a File object, which in
turn is created in the name of our destination ZIP file.
Quick hint: Similar functions exist within the BZip2, GZip
and Tar classes. Next, we need not concern ourselves with the nittygritties of
actually compressing each file. All we do is set up the compression level using
a simple 'SetLevel' method. This method takes a parameter (CompressionLevel),
which can vary from 0 through 6, with 6 being the maximum compression. Then, we
open the file to add to the ZIP in a standard File object and read in the
contents to a Byte array (Buf). Now, we need to compute its 32-bit CRC value.
This is very simply done in line 64 by 'oCrc32.Update(Buf)'. The Crc32 class
provided by the SharpZipLib library handles this for us.
Finally, set this CRC value to the Crc property of our
ZipEntry object (zEntry) and stream in the Bytes using the ZipOutputStream's
Write method. Simple, isn't it?
Our program lets the user change compression settings for any file in the zip. The drop down shows the current file selection for which this applies |
Now, let us move on to the unzip code. We are leaving out
the path manipulating code from the listing of lines 315—355 given below.
Dim Src As New
ZipInputStream(File.OpenRead(ZipFilePath))
Do
theEntry = Src.GetNextEntry()
If (theEntry Is Nothing) Then Exit Do
efPath = theEntry.Name
...
entryFileName = Path.GetFileName(efPath)
If (entryFileName.Length > 0) Then
If ( ... ) Then
...
End If
SW = File.Create(targetName)
Do
Size = Src.Read(Data, 0, Data.Length)
If (Size > 0) Then
SW.Write(Data, 0, Size)
Else
Exit Do
End If
Loop
SW.Close()
End If
End If
Loop
Just how writing to the Zip file involves a ZipOutputStream,
reading from it requires a ZipInputStream. This again requires
instantiation using an opened instance of the ZIP file we
want to extract from. Once opened, we need to enumerate through the
entire ZIP file to locate the file-there is no direct way
to do this yet. This is achieved using the GetNextEntry of the ZipInputStream
class.
Once found, we create a disk file of the same name (File.Create),
read in the data from the ZipInputStream's Read method, and write it to the
disk file we created. Now, where exactly did we uncompress the information or
check its CRC values? Well, it's in our ZipInputStream.Read method does this
automatically for us. If there were problems, the Try-Catch construct around the
above lines in our file will handle the exception.
The rest of the code in this file should be
self-explanatory. But let us discuss two key things before we move on. One, our
ZipOperation is what actually calls for both ZipAFile and Unzip (the classes
with the code above). And this ZipOperation uses a parameter 'DoAndWait'.
This is useful when there are several files being added
into a zip file and it is quite time consuming to keep recompressing the file
every time. When 'True', this parameter will cause the file to be added to a
temporary on-disk location, but not create the ZIP file itself. This also means
that in a set of calls to the ZipOperation function, the last one must have the
DoAndWait set to 'False' or the ZIP file will never be created.
This brings us to the second key point in our program. How
we zip files is by copying them away to a temporary location (value of
WorkingDirectory plus ZIP filename), and once all the files are there and
DoAndWait is False, the ZipAFolder is called to zip the entire folder at a time.
This saves a lot of computing power, but necessitates using that much of disk
space for the duration of the operation. So be careful when compressing large
files.
Current bugs
There are a few 'bugs' in our current code as given. In some places,
canceling a dialog will cause an error and the ZIP file will close. In other
places, operations will appear to succeed when actually the circumstances make
it impossible (exceptions are not being handled at all). Some of this is
intentional, others are accidental. Also we've left out explanation of some
code above. But in-file comments should be sufficient to help you understand
what's happening. The code is there for you so play with it, change it or fix
it. Have fun!
Sujay V Sarma