DESIGN AND IMPLEMENTATION OF A DATA COMPRESSION SOFTWARE
CHAPTER ONE
GENERAL INTRODUCTION
1.1.0 INTRODUCTION
In recent times,
there has been a great need driven towards the maximizing of data transfer
between communication terminals thereby making efficient use of network
bandwidth and disk space. Compression is the process used to reduce the
physical size of a block of information. Data encoding is the term used to
refer to algorithms that perform
compression. Data compression is a type of data encoding. Doyle and Carlson
(2000) write that data compression “has one of the most simple and elegant
design theories in all engineering”. A simple characterization of data is that
it involves transforming a string of characters in some representation (such as
ASCII) into a new string (0f bits for example) which contains the same
information, but whose length is as small as possible. Data compression
squeezes data so it requires less disk space storage, less bandwidth on a data
transmission channel. Communication equipment’s like modems, bridges and
routers use compression scheme to improve throughput over standard leased lines
or phone lines.
File
compression can be employed at various levels; a user can choose to compress
individual files, a whole folder or the whole of a drive. Most compression
schemes take advantage of the fact that data contains a lot of repetitions. For example, alphanumeric characters are
normally represented by a 7-bit ASCII code but a compression scheme can use a
3-bit code to represent its most common letters. Compressed files are called
archives. Archives can contain more than one file. Archive files are
manipulated with utilities such as WinZip or IZArc.
1.2.0 PROBLEM
DEFINITION
Many sources of
information contain redundant data or data that adds little to the stored
information. This results in tremendous amount of data being transferred
between client and server application. Many times lots and lots of information
is to be transferred over a communication channel; this information if not
compressed requires a lot of disk space for storage. Similarly, it is important
to note that large bits of information require large bandwidth over a
transmission channel. This bandwidth is
measured in bits/seconds which makes it costly. A large chunk of information
require more transmission time than less information. All these factors are the
problems that gave rise to the need for compression.
1.3.0 OBJECTIVE OF
THE STUDY
There are many reasons for data compressions; the main aim
of data compression is to reduce redundancy by reducing storage requirements.
When the amount of data to be transmitted is reduced, the effect is that of
increased storage capacity of the communication channel. Similarly, compressing
a file to half its original size is equivalent to doubling the capacity of the
storage medium. It may then become flexible to store the data at a higher rate
thus faster level of storage hierarchy and reduce the load on the input/output
channels of the computer system.
One objective
of this project is to achieve a faster file transfer as well as make use of
less bandwidth on a data. For data communication, the transfer of compressed
data over medium results in the increase in the rate of information transfer.
This is another aim of file compression.
Basically, source
coding for data compression is a method , utilized in data systems to reduce
the volume of digital data to achieve benefits in areas including but not
limited to;
(a)
Reduction of the transmission channel bandwidth
(b)
Reduction of the buffering and storage
requirements
Reduction of data transmission time at a given rate. Thus at
the end of this project, I should be able to develop a data/file compression
and decompression utility that aids easy transfer of data.
1.4.0 RESEARCH
JUSTIFICATION
Bandwidth is used as a synonym for data transfer rate (DTR) which is the
amount of digital data that is moved from one place to another in a given time.
It can be viewed as the speed of travel of a given amount of data from one
point to another. In general, the greater the bandwidth of a given path, the
higher the transfer rate together with other resources like disk space, time
and money which are very necessary in networking form the motivation for this
project.
This project is essential to all users of the internet and indeed all
users of the computer system as compression will allow more work to be done.
1.5.0 RESEARCH
METHODOLOGY
Data compression
can be implemented on existing hardware by software or through the use of
special hardware devices that incorporates compression techniques. The efficiency
of compression utility also depends on the specific algorithm used by the
compression program. While it is possible to compress and decompress data using
tools such as WinZip, Gzip and Java Application (or jar) these are used as
standalone applications. The WinZip tool is used to create a compressed archive
and to extract files from compressed archive in the windows. On UNIX, tar is
used to create archive file then the Gzip command is used to compress the file.
Others are the lossy and lossless techniques. The lossless data compression has
the ability to return the decompressed data after compression back to its
original form. On the other hand, in the
lossy compression the decompressed data may be different from the original
data. An example of lossless compression is WinZip and JPEG is an example of a
lossy compression. Lossy compression method typically offers a three-way trade
off between compression speed, compressed data size and quality.
In this one,
it is intended that the lossless algorithm shall be used as a tool to create a
compression utility like WinZip, Gzip and JPEG to solve the problem of high use
of internet bandwidth, reduce the problem of low disk space, hence doubling the
capacity of the storage medium to aid early file transfer.
1.6.0 SCOPE AND
LIMITATION OF STUDY
The scope of the
study; implementation and design of a file compression is based on the study of
already existing compression utility and compression algorithm which shall lead
to an introduction of a new compression utility that aid resources like storage
space, data transfer rate, bandwidth, disk space, time and money.
The limitations
that will hinder effective implementation of this project include;
(a)
The scope is centered on basic implementation of
compression and decompression processes.
(b)
Another limitation is that the project
implementation would not take into consideration low level details and
technicalities involved in creating a compression utility, but will focus on
employing various pre-existing APN (Application programming Interface) and
libraries in order to create such a utility.
1.7.0 DEFINITION OF
TERMS
ALGORITHM: a set of instructions followed in a fixed order
and used to solve computer programs.
THROUGHPUT: the amount of work, goods or people that are
dealt with in a particular period of time.
STUBS: accidentally strike against something.
UTILITY: a piece of computer software that has a particular
use.
PROLIFERATION: a sudden increase in the amount or number of
something.
ENSEMBLE: a set of things that go together to form a whole.
PERMUTES: submit to a process of alteration, rearrangement
or permutation.
PREMISE: a previous statement from which another is
inferred.
PATENT: a special
document that gives you the right to make or sell a new invention or product
that no one is allowed to copy.
ENCRYPTION: a process or securing information on the
computer using special codes that only some people can read.
METADATA: information that describes what is contained in
large computer data bases
INNOCUOUS: something that is not likely to cause harm to
anyone or to cause trouble