Working with zip and similar files

Hey guys, wondering if anyone knows any functions in windows (which is the preferable option) or any external libraries designed for working extensively with the zip/rar/7z/etc extensions.

I need to be able to get an array or list of all the file names and some way to access the files themselves for read and write (preferably exclusively), be able to retrieve files and open them in both raw data and ASCII/Unicode printed characters from the files.

If possible I'd like a library that isn't picky on the actual extension name but will just try and read the encoding type instead (I'd like to store object data in files in a zip folder and perhaps rename the extension just to put a few people off... Or perhaps to indicate to other parts of my program exactly what that folder is for)

I know this is really picky but compromises can be made and I am most grateful if anybody knows of ways of doing this in Windows, if not then an external library is my second choice.
Thanks for any help or suggestions
They are all different formats. I believe if you download 7-Zip it has a DLL that you can use to handle those kinds of things.

Windows only understands ZIP (and a couple of other things that won't matter to you), so you'll want the vexternal library.

Good luck!
Would anyone have any ideas how I might go about creating my own type of zip style file?
As you've mentioned there are many different library formats. I can understand how I'd create my own file type but how would I create a file type which stores files rather than raw data.
A "file" is just a list of bytes. We'll call each file "data".

Your archive file type needs only be very simple. First, you need a directory structure, where each entry is a record that
- names the data (a "filename")
- locates the beginning of the data in the archive file (this is an unsigned integer index to the first byte of data)
- gives the length of the data in bytes
- tells what kind of compression (including "none") used on the data
- optionally provides a checksum or some other method to validate the data has not been corrupted

Your file only need
- identify itself (usually the first few bytes will be a unique identification code)
- give the number of entries in the directory
- then the directory itself
- then each block of data

All archive file types are a variation on that. Zip, for example, keeps the directory structure at the end of the file.

The Unix utility program tar does essentially this - wraps a bunch of files up into a single file. It does not compress the data.

Things like Zip and 7z and whatever else you have in mind do compress the data they are storing.

Hope this helps.
Thanks a lot.
I had the idea that if I was to write my own that it would essentially be just one file in itself but it's the program associated with it that'll read the information stored in that one file in such a way that it displays in a GUI as a file system.
Out if interest, did you want/need to handle a number of different archive types, or do you just need to use one? If you just to choose and use one, then you could use the built-in support.

The Win32 functions Duoas refered to are part of the File Management API: LZInit, LZOpenFile, LZRead, etc.

Edit: Note that the Win32 LZ functions are provided to support the unzipping of single, compressed files (NTFS uses the LZNT1 algorithm (a variant of the LZ77) for file compression on a per file basis.) While it is no doubt possible to use LZSeek and LZRead to decompress a structured archive (i.e. a compress directory subtree of files), it would be better to use a decent zip library for this purpose. If you went with the ZL functions, you'd be on you own when it comes to handling the directory structure, etc

File Management Functions
http://msdn.microsoft.com/en-us/library/windows/desktop/aa364232%28v=vs.85%29.aspx

But if you need to handle a wider range of the file types, there's the 7-Zip SDK (also alluded to)

LZMA SDK (Software Development Kit)
http://www.7-zip.org/sdk.html

plus assorted other libraries (you'll probably need more than one to cover all of the popular archive file types.)

libarchive
http://people.freebsd.org/~kientzle/libarchive/
including .tar, tar.gz, tar.bz2

liblzma
http://tukaani.org/lzma
.xz and .lzma

libZip
http://www.nih.at/libzip
.zip

zlib
http://www.zlib.net
.gz

will just try and read the encoding type

The various compression schemes use a magic number as part of their file header, so you could check that to work out what the file probably is. But there's always a slim chance that the value is there randomly, of course. And I think (older) .zip files do turn up without a magic number, just to complicate things.

any ideas how I might go about creating my own type of zip style file

If you are thinking about implementing your own type of archive, Windows does provide a mechanism that allow you treat a file like file systems in its own right: Structured Storage. This API is pretty involvd and it also COM-based (Component Object Model) API, which could be a bit of a headache if you haven't come across it before.

COM Structured Storage
http://en.wikipedia.org/wiki/COM_Structured_Storage
(this article mentions other structured storage libraries)

Structured Storage
http://msdn.microsoft.com/en-us/library/windows/desktop/aa380369%28v=vs.85%29.aspx

Your description of what you want to do does sound a bit like the GUI tools that come with 7-Zip and WinZip??

Andy

Component Object Model
http://en.wikipedia.org/wiki/Component_Object_Model

COM: Component Object Mode Technologies
http://www.microsoft.com/com/default.mspx
Last edited on
Well that is a big helpful information dump.
I'm sure whatever I may need will be here somewhere but I am only really needing one type of archive, I was just curious as to how I'd read all the different ones currently available.
I was intentionally describing what I was looking for as sort of a GUI because I imagine this would be one of the easier ways of looking at it.

Doing some googling myself I found the ZipFile library on MSDN
http://msdn.microsoft.com/en-us/library/system.io.compression.zipfile.aspx

And if I couldn't find anything else that suited me then I'd just use that along with the functions for dealing with files/directories in System::IO library
http://msdn.microsoft.com/en-us/library/system.io.aspx

Anyways I'm sure I'll have more than enough info here to continue my work, if not then I'm sure it's gave me a very good starting point for googles!
Cheers guys
If you are curious about starting your own, you should check out some open source projects. My favorite open source archiver at the moment is FreeArc.
http://freearc.org/Download.aspx

There was this kgb archiver a while back that had very good compression ratios using a lot of time. It had a download for source code as well, but it seems their website is offline. I did manage to find it on the internet archive though.
http://web.archive.org/web/20110106120649/http://kgbarchiver.net/
Last edited on
Added note re limitation of Win32 LZ function to previous post in this thread, i.e.

andywestken wrote:
Edit: Note that the Win32 LZ functions are provided to support the unzipping of single, compressed files (NTFS uses the LZNT1 algorithm (a variant of the LZ77) for file compression on a per file basis.) While it is no doubt possible to use LZSeek and LZRead to decompress a structured archive (i.e. a compress directory subtree of files), it would be better to use a decent zip library for this purpose. If you went with the ZL functions, you'd be on you own when it comes to handling the directory structure, etc.

Andy
Last edited on
Topic archived. No new replies allowed.