42.2 System Requirements
The most important thing is to determine the maximum network load the
system must bear. It is, therefore, important to pay more attention
to the load
peaks, because these might be more than four times the day's average. When
in doubt, it would be better to overestimate the system's requirements,
because having Squid working close to the limit
of its capabilities could lead to a severe loss in the quality of the
service. The following sections point to the system factors in order of
significance.
42.2.1 Hard Disks
Speed plays an important role in the caching process, so this factor
deserves special attention. For hard disks, this parameter is described as
random seek time, measured in milliseconds. Because
the data blocks that Squid reads from or writes to the hard disk tend to be
rather small, the seek time of the hard disk is more important than its
data throughput. For the purposes of a proxy, hard disks with high rotation
speeds are probably the better choice, because they allow the read-write
head to be positioned in the required spot more quickly. One possibility to
speed up the system is to use a number of disks concurrently or to employ
striping RAID arrays.
42.2.2 Size of the Disk Cache
In a small cache, the probability of a HIT (finding the requested object
already located there) is small, because the cache is easily
filled and
the less requested objects are replaced by newer ones. If, for
example, one GB is available for the cache and the
users only surf ten MB per day, it would take more than one
hundred days to fill the cache.
The easiest way to determine the needed cache size is to consider the
maximum transfer rate of the connection. With a 1 Mbit/s
connection, the maximum transfer rate is 125 KB/s. If
all this traffic ends up in the cache, in one hour it would add up to
450 MB and, assuming that all this traffic is generated
in only eight working hours, it would reach 3.6 GB in
one day. Because the connection is normally not used to its upper volume
limit, it can be assumed that the total data volume handled by the cache is
approximately 2 GB. This is why 2 GB
of disk space is required in the example for Squid to
keep one day's worth of browsed data cached.
42.2.3 RAM
The amount of memory (RAM) required by Squid directly
correlates to the number of objects in the cache.
Squid also stores cache object references and
frequently requested objects in the main memory to speed up retrieval of
this data. Random access memory is much faster than a hard disk.
In addition to that, there is other data that
Squid needs to keep in memory, such as a table
with all the IP addresses handled, an exact domain name cache, the most
frequently requested objects, access control lists, buffers, and more.
It is very important to have sufficient memory for the
Squid process, because system
performance is dramatically reduced if it must be swapped to disk. The
cachemgr.cgi tool can be used for the cache
memory management. This tool is introduced in
Section 42.6,
cachemgr.cgi.
Sites with huge network traffic should consider using an AMD64
or Intel EM64T system with more than 4 GB of memory.
42.2.4 CPU
Squid is not a program that requires intensive
CPU usage. The load of the processor is only increased while the contents
of the cache are loaded or checked. Using a multiprocessor machine
does not increase the performance of the system. To increase efficiency, it
is better to buy faster disks or add more memory.