42.1 Some Facts about Proxy Caches
As a proxy cache, Squid can be used in several ways. When
combined with a firewall, it can help with security. Multiple
proxies can be used together. It can also determine what types of
objects should be cached and for how long.
42.1.1 Squid and Security
It is possible to use Squid together with a
firewall to secure internal networks from the outside using a proxy cache.
The firewall denies all clients access to external services except
Squid. All Web connections must be established by the proxy. With
this configuration, Squid completely controls Web access.
If the firewall configuration includes a DMZ, the proxy should operate
within this zone. Section 42.5,
Configuring a Transparent Proxy
describes how to implement a
transparent proxy. This simplifies the
configuration of the clients, because in this case they do not
need any information about the proxy.
42.1.2 Multiple Caches
Several instances of Squid can be configured to exchange objects between
them. This reduces the total system load and increases
the chances of finding an object already existing in the local network. It
is also possible to configure cache hierarchies, so a cache is able to
forward object requests to sibling caches or to a parent
cache—causing it to get objects from another cache in the
local network or directly from the source.
Choosing the appropriate topology for the cache hierarchy is very important,
because it is not desirable to increase the overall traffic on the
network. For a
very large network, it would make sense to configure a proxy server for
every subnetwork and connect them to a parent proxy, which in turn is
connected to the proxy cache of the ISP.
All this communication is handled by ICP (Internet cache protocol) running
on top of the UDP protocol. Data transfers between caches are handled using
HTTP (hypertext transmission protocol) based on TCP.
To find the most appropriate server from which to get the objects, one
cache sends an ICP request to all sibling proxies. These answer the
requests via ICP responses with a HIT code if the object was detected or a
MISS if it was not. If multiple HIT responses were found, the proxy server
decides from which server to download, depending on factors such as
which cache sent the fastest answer or which one is closer. If no
satisfactory responses are received, the request is sent to the
parent cache.
HINT:
To avoid duplication of objects in different caches in the network, other
ICP protocols are used, such as CARP (cache array routing protocol) or
HTCP (hypertext cache protocol). The more objects maintained in the
network, the greater the possibility of finding the desired one.
42.1.3 Caching Internet Objects
Not all objects available in the network are static. There are a lot of
dynamically generated CGI pages, visitor counters, and encrypted SSL
content documents. Objects like this are not cached because they
change each time they are accessed.
The question remains as to how long all the other objects stored in the
cache should stay there. To determine this, all objects in the cache are
assigned one of various possible states.
Web and proxy servers find out the status of an object by adding headers
to these objects, such as Last modified
or
Expires
and the corresponding date. Other headers specifying
that objects must not be cached are used as well.
Objects in the cache are normally replaced, due to a lack of free hard disk
space, using algorithms such as LRU (last recently used). Basically this
means that the proxy expunges the objects that have not been
requested for the longest time.