ARBA MINCH UNIVERSITY

March 16, 2019 Critical Thinking

ARBA MINCH UNIVERSITY (AMU)
INSTITUTE OF TECHNOLOGY
SCHOOL OF POSTGRADUATE STUDIES
DEPARTMENT OF COMPUTER SCIENCE
AND
INFORMATION TECHNOLOGY
A HYBRID PROXY CACHING REPLACEMENT SCHEME TO IMPROVE THE PERFORMANCE OF WEB CACHING
THE CASE OF ARSI UNIVERSITY

Master’s thesis

By:
Ayinachew Gashaw

Advisor: – Dr.J.R.ArunKumar (PHD)
ARBA MINCH UNIVERSITY (AMU)
INSTITUTE OF TECHNOLOGY
SCHOOL OF POSTGRADUATE STUDIES

DEPARTMENT OF COMPUTER SCIENCE
AND
INFORMATION TECHNOLOGY
A HYBRID PROXY CACHING REPLACEMENT SCHEME TO IMPROVE THE PERFORMANCE OF WEB CACHING
THE CASE OF ARSI UNIVERSITY
By: Ayinachew Gashaw
Advisor: – DR. J.R.Arunkumar (PHD)

A Thesis Submitted to the School of Post Graduate studies in Partial Fulfillment of the Requirements for the Degree of Master of Science in
Information Technology

August 2018
Arba Minch, Ethiopia

DeclarationI the undersigned, do hereby declare that this thesis is the result of my own investigation and research, except to the extent indicated in the Acknowledgments, References and that it has not been submitted in part or in fully for any other degree to any other university.
Ayinachew Gashaw _______________________ _______________
Name Signature Date
ARBA MINCH UNIVERSITY
SCHOOL OF POSTGRADUATE STUDIES
ADVISOR’S THESIS SUBMISSION APPROVAL SHEET
This is to certify that the thesis entitled “A Hybrid Proxy Caching Replacement Scheme to Improve the Performance of Web Caching: The Case of Arsi University” submitted in partial fulfillment of the requirements for the degree of Master of Science in Information Technology, the Graduate Program of the Department of Computer Science and IT, and has been carried out by Ayinachew Gashaw, under my supervision. Therefore, I recommend that the student has fulfilled the requirements and, hence, hereby can submit the thesis to the department for defense.

Name of Advisor: – DR. J.R.Arunkumar (PHD)
112395034575750
Signature:-
Date: – 24/08/2018

EXAMINER’S THESIS APPROVAL SHEET
This thesis entitled with “A Hybrid Proxy Caching Replacement Scheme to Improve the Performance of Web Caching: The Case of Arsi University” has been approved by the following examiners in partial fulfillment of the requirements for the degree of Master of Science in Information Technology
SUBMITTED BY:

Ayinachew Gashaw ___________________ ______________
Student Signature Date
Approved by:
DR. J.R.Arunkumar ________________ ______________
Name of Principal Advisor Signature Date
___________________ __________________ _____________
Name of External Examiner Signature Date
___________________ ___________________ ______________ .
Name of Internal Examiner Signature Date
___________________ ___________________ ______________
PG Coordinator Signature Date
___________________ ___________________ ______________
Name of Chair Person Signature Date
___________________ ___________________ ______________
Department Head Signature Date
AcknowledgmentPrimarily, I would like to express my deep gratitude to my advisor Dr.J.R.ArunKumar for his continuous guidance and support during this thesis. I am very grateful to him for his patience and willingness to read my work and provide very useful and constructive suggestions to this thesis.
I must extend my gratitude to thank my co-advisor Mr. Animaw Kerie for his great contribution during this thesis and Hacahulu Atomsa ICT Directorate of Arsi University for his contribution and advice to do this thesis on the university compound.
During the period of four years as PG student at AMU, I would like to thank all my classmates specially my best friend Zenebe Assebe for his assistance and contribution.
Finally, yet importantly, I am indebted more thanks to my wife Tigist Fikadu for her continuous support and encouragement throughout my postgraduate study. Without her support, it would have been more challenging for me to finish my postgraduate education in time.
Table of Contents
TOC o “1-3” h z u Declaration PAGEREF _Toc522850341 h iAcknowledgment PAGEREF _Toc522850342 h ivList of Figures PAGEREF _Toc522850343 h viiiList of Tables PAGEREF _Toc522850344 h ixList of Charts PAGEREF _Toc522850345 h xAbbreviations PAGEREF _Toc522850346 h xiAbstract PAGEREF _Toc522850347 h xiiCHAPTER ONE PAGEREF _Toc522850348 h 11. Introduction PAGEREF _Toc522850349 h 11.1 Problem Statement PAGEREF _Toc522850350 h 31.2 Objective of the study PAGEREF _Toc522850351 h 31.2.1 General Objective PAGEREF _Toc522850352 h 31.2.2 Specific Objectives PAGEREF _Toc522850353 h 31.3 Research Questions PAGEREF _Toc522850354 h 41.4 Significance of Study PAGEREF _Toc522850355 h 41.5. Methodology PAGEREF _Toc522850356 h 41.5.1 Research Design PAGEREF _Toc522850357 h 41.5.2 Tools PAGEREF _Toc522850358 h 51.6 Scope and limitation of the study PAGEREF _Toc522850359 h 51.7 Thesis organization PAGEREF _Toc522850360 h 5CHAPTER TWO PAGEREF _Toc522850361 h 62. LITERATURE REVIEW PAGEREF _Toc522850362 h 62.1 Overview PAGEREF _Toc522850363 h 62.2. Web Proxy Caches PAGEREF _Toc522850364 h 62.2.1. How A proxy Caches Works PAGEREF _Toc522850365 h 72.2.2. Types of web caching PAGEREF _Toc522850367 h 82.3 Web Caching Architectures PAGEREF _Toc522850368 h 102.3.1 Hierarchical caching architecture PAGEREF _Toc522850369 h 102.3.2 Distributed Caching Architecture PAGEREF _Toc522850370 h 112.3.3 Hybrid Caching PAGEREF _Toc522850371 h 112.4. Cache Performance Analysis PAGEREF _Toc522850372 h 122.4.1. Common Performance Metrics PAGEREF _Toc522850373 h 122.4.2 Performance factors PAGEREF _Toc522850374 h 132.5 Replacement Polices PAGEREF _Toc522850375 h 132.5.1 Why we research Replacement Polices PAGEREF _Toc522850376 h 132.5.2 Factors to be considered PAGEREF _Toc522850377 h 152.5.3 Existing Replacement Algorithms PAGEREF _Toc522850378 h 162.6 Studies Based on Web Caching PAGEREF _Toc522850379 h 19CHAPTER THREE PAGEREF _Toc522850380 h 223. RESEARCH METHODOLOGY PAGEREF _Toc522850381 h 223.1. Raw Data Collection PAGEREF _Toc522850382 h 223.2. Data Pre-processing PAGEREF _Toc522850383 h 223.3. Simulator Description PAGEREF _Toc522850384 h 233.4 Assumptions PAGEREF _Toc522850385 h 233.5. Experimental Design PAGEREF _Toc522850386 h 243.5.1. Cache Sizes PAGEREF _Toc522850387 h 243.5.2. Cache Replacement Policies PAGEREF _Toc522850388 h 243.5.3. Performance metrics PAGEREF _Toc522850389 h 253.6 Virtual Cache Experiments PAGEREF _Toc522850390 h 25CHAPTER FOUR PAGEREF _Toc522850391 h 274. RESULT ANALYSIS AND DISCUSSION PAGEREF _Toc522850392 h 274.1 Workload Characterization PAGEREF _Toc522850393 h 274.1.1. Summary of Workload Characteristics PAGEREF _Toc522850394 h 274.1.2 Response Status Codes PAGEREF _Toc522850395 h 304.1.3 Relationship between workload characterization and simulation results PAGEREF _Toc522850396 h 314.1.3.1 Cacheable/Uncacheable objects and response status codes PAGEREF _Toc522850397 h 314.1.3.2 Object types PAGEREF _Toc522850398 h 314.1.3.3 One –Time Referencing PAGEREF _Toc522850399 h 324.2 Replacement polices simulation Results PAGEREF _Toc522850400 h 334.2.1 Final results PAGEREF _Toc522850401 h 344.3 Virtual Cache simulation results PAGEREF _Toc522850402 h 354.3.2 Virtual Cache Control of Hit Rate and Byte Hit Rate PAGEREF _Toc522850403 h 35Chapter Five PAGEREF _Toc522850404 h 395. Conclusion and Recommendation PAGEREF _Toc522850405 h 395.1. Conclusion PAGEREF _Toc522850406 h 395.2 Recommendation PAGEREF _Toc522850407 h 39References PAGEREF _Toc522850408 h 40Appendix A PAGEREF _Toc522850409 h 42
List of Figures TOC h z c “Figure” Figure 1. Cache PAGEREF _Toc522845694 h 7Figure 2 Locations of web cache PAGEREF _Toc522845695 h 8
List of Tables TOC h z c “Table” Table 1 Summary of proxy access log characteristics (Reduced data set) PAGEREF _Toc522847389 h 27Table 2 Breakdown of Document Types and Sizes PAGEREF _Toc522847390 h 32Table 3 One-Time Referencing Behavior in web proxy workloads PAGEREF _Toc522847391 h 32Table 4 Proxy log hit rate performance ratings PAGEREF _Toc522847392 h 33Table 5 Proxy log Byte hit rate performance ratings PAGEREF _Toc522847393 h 33Table 6 Final hit rate performance rating PAGEREF _Toc522847394 h 34Table 7 Final byte hit rate performance rating PAGEREF _Toc522847395 h 34
List of Charts
TOC h z c “Chart” Chart 1 Percentage of Cacheable/Uncacheable Requests PAGEREF _Toc522605896 h 28Chart 2 Percentage of Cacheable/Uncacheable Requests (Bytes) PAGEREF _Toc522605897 h 29Chart 3 HR/BHR Performance Comparison (Small) PAGEREF _Toc522605898 h 36Chart 4 HR/BHR performance comparison (Medium) PAGEREF _Toc522605899 h 37Chart 5 HR/BHR Performance Comparison (Large) PAGEREF _Toc522605900 h 38
Abbreviations ARU Arsi University
BHR Byte Hit Rate
CARP Cache Array Routing Protocol
CPU Central Processing Unit
FIFO First In First Out
GDS Greedy-Dual Size
GDSF Greedy-Dual Size Frequency
HR Hit Rate
HTTP Hypertext Transfer Protocol
HTML Hypertext Markup Language
ICP Internet Cache Protocol
ISPs Internet Service Providers
LFU Least Frequently Used
LRU Least Recently Used
LRFU Least Recently/Frequently Used
LFU-DA Least Frequently Used with Dynamic Aging
QOs Quality of Services
TCP Transmission Control Protocol
VC Virtual Cache
URL Uniform Resource Locator
WWW World-Wide Web

AbstractWeb caching improves the execution and offers proficient access to the web content. It plays a vital role in reducing the server load, latency time perceived by the users, network traffic, and bandwidth consumption. Even so, the web cache memory has a little memory in this manner; it cannot record each web question (objects). Numerous schemes are utilized to take care of this issue yet those strategies insufficient upgrade (improve) the cache hit rate and cache byte hit rate.
The main concern of this research study is, to investigate what is the best cache replacement policy for enhancing (optimizing) cache hit rate ratio and byte hit ratio by getting to internet browsers. .
Diverse replacement polices will be tried with the log documents got from the university proxy server.
In any case, the greater part of the cache replacement works on single characteristics, due it’s characteristics most of the replacement police have not the possibility to achieve better hit rate ratio and byte hit ratio for better performance of web caching. This research, to show a hybrid proxy caching replacement method for better cache hit rate ratio and byte hit ratio. In this paper, the researcher investigates the significance of various web proxy workload .In this paper, the researcher analyze the importance of different web proxy workload features such as object size, recency of reference, frequency of reference, and turnover in the active set of objects. Trace-driven simulation is utilized to assess the effectiveness of various replacement policies for Web proxy caches. The broadened span of the trace (20 million requests gathered more than 45 days) permits long-term side effects of replacement polices to be distinguished and measured. Likewise, the researcher presents the idea of virtual caches, an approach for enhancing the performance of the cache for numerous measurements at the same time.
The outcomes show that higher cache hit rates are accomplished utilizing size-based replacement polices (GDSF) since these police store numerous of small object in the cache, consequently expanding the likelihood of a question being in the cache when asked. Likewise, frequency based polices (LFU-DA) accomplish higher byte hit rates as they keep the most well- known records, regardless of size, in the cache. The target beneficiary of the study would be ARU by managing the web caching which is a vital for the clients to access web contents easily.
Keywords: performance evaluation, proxy caching, replacement policies, trace-driven simulation,
cache Hit rate, byte hit rate, web caching, frequency, recency
CHAPTER ONE1. IntroductionThe World Wide Web or internet is a global system of interconnected computer networks. WWW (World-Wide Web) have become integral components of modern society. Whether it is in business, research, education, or entertainment, the ability to access vast amounts of information almost instantaneously has given users of the web enormous freedom and power. Because of this, web continues to grow on daily basis. It grows in the number of users, number of servers, and content available to the users accessing those servers. Users suffer with low response time, network traffic, server load etc.
One of the important means to improve the performance of web service is to employ caching mechanisms.
A web cache is a storage device that is used to temporarily keep web pages and contents to be served for subsequent requests of the same item.
There are two main reasons that Web caches are used: CITATION V4I14 l 1033 (1)
To reduce latency — because the request is satisfied from the cache (which is closer to the client) instead of the origin server, it takes less time for it to get the representation and display it. This makes the Web seem more responsive.
To reduce network traffic — because representations are reused, it reduces the amount of bandwidth used by a client. This saves money if the client is paying for traffic, and keeps their bandwidth requirements lower and more manageable.
Web proxy caches have an important role in reducing server loads, client request latencies, and network traffic. When caching is not in place, requests for content must make multiple trips back and forth between the requesting computer and the server site where the content is stored. When caching is implemented, frequently accessed content is stored close to the users, eliminating this duplicated effort. A request from a user’s browser is first sent to the network’s caching server. If the requested content can be found on the web cache and the information is current, or fresh, the content is sent directly back to the requester, skipping an upstream journey to the target website. This is called a web cache hit. When the proxy receives a request from a client, the proxy attempts to fulfill the request from among the objects stored in the proxy’s cache. If the requested object is found (a cache hit), the proxy can immediately respond to the client’s request.
If the requested object is not found (a cache miss) the proxy then attempts to retrieve the object from another location, such as a peer or parent proxy cache or the origin server. The proxy can complete its response to the client once a copy of the object has been retrieved. If the requested object is cacheable (based on information provided by the origin server or determined from the URL) the proxy may decide to add a copy of (again determined from the URL or information from the origin server) the proxy should not store a copy in its cache.
Caching can be implemented at different stages in the path of a HTTP request:
I) In the browser of the client,
ii) At some middle point, between the client and the server (proxy web server), and
iii) Close to the original web server (inverse proxy or surrogate).
Browser caches store objects requested by a single user while proxy caches store objects for many users, usually in a homogeneous segment of Internet users; e.g., a university, a private or public corporation, or an ISP. Due to the larger number of users connected to the same proxy server, object characteristics (popularity, spatial and temporal locality) can be better exploited increasing the cache hit ratio and improving web performance.
A key component of any caching system is the policy used to remove an object from the cache when the cache when the cache fills up. If the cache had an infinite amount of space available to it, this would not be a problem; but despite the continuing failing price of storage, this is not a realistic choice. Because of this, the replacement policy being used will continue to be very important. There has been continuing research in the area of web caching, although the field is not as mature as memory or file caching. Again, this is due to the special problems presented by caching on the web. Numerous file replacement polices have been proposed that relevance to improve the performance of the caching system currently available, but the search continues.
Therefore for improving the performance of the web caching we need an efficient cache replacement policy that optimize (enhance) the two common performance metrics HR (Hit Rate) and BHR (Byte Hit Rate) that helps the uses many features like reducing latency perceived by the clients, reducing server load, reduced network traffic and saving the bandwidth.
1.1 Problem Statement
Web sites play an important role in academic institutions especially in universities and colleges as they use them as a method of teaching and a means of online internationalization. Universities that provide services to their students, Instructors and administrative staffs and other members of the community must target efficient and more reliable services while keeping cost-effective criteria. The current scenario of the Arsi University (ARU) mainly faced the problems like heavy network traffic, most of the time the origin server loaded, processing delay and broadcast delay, slow services that provided to the communities. To challenge those problems and enhance the performance of the network, a better cache replacement polices will be implemented. However, there are different performance metrics the web caching algorithm may optimize. Nevertheless, by considering the two common performance metrics, hit rate (as in other caching system) and byte hit rate (a consequence of the varying-size documents) it better to improve the performance of web caching in the university.
In general, an improvement in the hit rate will result in reduction of the latency experienced by the client. On the other hand, an improvement in the byte hit rate will tend to reduce the internal traffic of a website.
Therefore, it appears that web caching technique can be fine-turned for one of the two performance metrics, but not at the same time in order to solve the current problems faced by university(in Asella branch) like, heavy network traffic, bandwidth consumption, latency and others issues. Consequently, the effective proxy caching techniques benefits the university to serve a greater number of Internet users without the need to enlarge the capacity of the existing network.
1.2 Objective of the study
1.2.1 General Objective
The main goal of this research is to design a hybrid proxy caching replacement scheme by considering the performance metrics (Hit Rate and Byte Hit Rate) to improve the performance of web caching in the university.
1.2.2 Specific Objectives
To accomplish the above stated general objective, the following specific objectives are developed:-
To think about and examine existing web caching algorithms
To investigate the possibility of identifying a policy that optimizes both metrics at the same time, or that at least allows easy control over which performance metric it optimizes.

To develop a trace driven simulation to compare the performance of the caching Algorithms and evaluate the adaptive model.
1.3 Research Questions
Based on the statement of the problem, attempts are made to answer the questions below.
RQ1: How effectively are current proxies cache replacement polices for real workloads?
RQ2: How to evaluate the performance of cache replacement policy for Arsi University
network framework and which of these web caching solutions have better performance to
optimize the hit rate and byte hit rate?
RQ 3: Which trace-driven simulator achieve better performance in regard to optimize cache
Hit rate and byte hit rate.
1.4 Significance of Study
This research was proposed especially for Arsi university network framework to provide a high cache hit rate and byte hit rate in its web caching schemes in order to improve the web caching performance. Therefore, the outcome will benefit Arsi university communities (students, Instructors, staff members, and others) and it benefits the university to generate additional revenue by saving the cost incurring on the international link or by adding greater number of Internet users without the need to enlarge the Quality of Service (QoS) in the existing network. The outcome of this research study will be used as a basis for related study in this problem domain.
1.5. Methodology
1.5.1 Research Design
In this study, to optimize (enhance) the performance metrics (Hit rate and byte hit rate) based on access log data taken from the ARU proxy server, the following methodology is applied.
To enhance the performance metrics we can follow different steps or procedures. For the purpose of this research study, the following steps were followed.
I. Review of Related Literature
Different relevant literature in proxy caching techniques are reviewed for better understanding of the different cache replacement policy for improving the web caching performance for better performance in enhancing the HR (Hit Rate) and BHR (Byte Hit Rate).
II. Data Collection
First, data for the proxy logs files and traces have been obtained from proxy servers of the ARU network for the requested web objects.
1.5.2 Tools
In this study, the researcher uses Visual Studio C++ version 2012 as a trace driven simulator which uses the revised proxy log file as an input and generates files that contain the HR (Hit Rate) and BHR (Byte –Hit Rate) as outputs by writing an algorithm.
1.6 Scope and limitation of the study
The scope of this research was to optimize the two common performance metrics (Hit Rate and Byte Hit Rate) to improve web caching performance using the log files taken from ARU proxy server. The researcher do not consider issues like security or authentication since it requires more in-depth coverage than that can provide in this study.
1.7 Thesis organization
The thesis is organized into five chapters. Chapter one (1) introduces the thesis and an overview of the research upon which the thesis is based. Chapter Two (2) contains the survey of related literature and research. A discussion of the research to be performed and the methodology to be used is presented in Chapter Three (3). A description of the research and numerical results are provided in Chapter Four (4). Conclusions and potential future research directions are presented in Chapter Five (5).

CHAPTER TWO2. LITERATURE REVIEW2.1 Overview
Using of proxy caches in the government computer networks is useful to the server administrator, network administrator, and end user because it reduces the amount of redundant traffic that circulates through the network. Moreover, end users get quicker access to documents that are locally cached in the caches. The World-Wide web (WWW or Web) has experienced phenomenal growth in recent years. This growth of the web has contributed significantly to the network traffic on the Internet, and motivated much research into improving the performance and scalability of the web.
Without caching, the WWW (World Wide Web) would become a victim of its own success. The size and popularity of the WWW systems have grown dramatically in the last couple of decades. The demand on the WWW systems to provide response in quick time is also increasing. The increase in web users and web applications has led to an increase in latency, network congestion and server overloading.
Web cache is a mechanism for the temporary storage (caching) of web documents, such as HTML pages and images, to reduce bandwidth usage, server load, and perceived lag. Similarly, various web caching techniques have been developed to increase the performance of web caching. Caching replacement scheme that is an efficient technique to enhance the user experience in WWW has been extensively researched in the recent past. Many aspects of Web caching have been widely studied by the research community. This literature review is intended as an introduction to those areas that are relevant to this thesis.

2.2. Web Proxy CachesWeb caching is the temporary storage of remote web objects on a local server CITATION Pon13 l 1033 (2). Web caching can effectively decrease network traffic volume, and reduce the latency problem CITATION TRG10 l 1033 (3)by bringing documents closer to the clients. Web caching has been recognized as one of the most important techniques to help to scale the Internet. The hit rate of a Web cache is a function of the client population it manages.
During the last years, there has been extensive research on how to make caches cooperate to increase the total effective client population, enhance the hit ratios and byte hit ratio, and reduce the document-access latency.
In recent years, Web proxies have been deployed to reduce network traffic and provide better response time for Web accesses. A Web proxy consists of application level software that accepts document retrieval requests from a set of clients, forwards these requests to appropriate servers if the requested documents are not already present in the proxy’s cache, and sends documents back to the clients.
Proxies were originally designed to allow network administrators to be able to control access to the Internet from within an Intranet.
2.2.1. How A proxy Caches WorksA proxy cache server receives HTTP requests from clients for a web object and if it finds the requested object in its cache, it returns the object to the user without disturbing the upstream network connection or destination server. If it is not available in the Cache, the proxy attempts to fetch the object directly from the object’s home server. Finally, the originating Server, which has the object, gets it, possibly deposits it and returns the object to the user.

The cache functionality is depicted in Figure 1.

838200213995 User Request Data
Are the data in the cache?
Provide data to the user
Download the data and store
them in the cache.

00 User Request Data
Are the data in the cache?
Provide data to the user
Download the data and store
them in the cache.

8001004011295Figure 1. Cache00Figure 1. Cache Yes, cache hit
No, cache miss
2.2.2. Types of web caching
Web caching can be done using varying degrees of cache level: from zero to many levels. As a client requests a web object, it flows from a server, through a network, and to the client. Between a client and a server may be one or more proxy servers.

Web cache servers are widely deployed in many places throughout the Web CITATION Dil11 l 1033 (4). Three distinct approaches to web caching currently exist, including client-side caching (browser cache) CITATION Wal09 l 1033 (5), server-side caching (cache server) CITATION Kyu06 l 1033 (6)and proxy caching (proxy server) CITATION KRa12 l 1033 (7) .The figure below show the locations of web cache.

Figure 2 Locations of web cacheA. Browser cache
A web browser cache stores local copy of web objects, which have been accessed recently, based on a specific cache management policy. As it keeps copies for a client only, the browser cache does not represent any shared accesses between clients. There are two forms of client caches. A persistent client cache keeps web objects between invocations of the web browser. A non-persistent client cache removes cached copies when the user quits the browser.
B. Proxy cache
In general, a proxy is a special HTTP server that can run on a firewall machine. A firewall is a security system to protect a networked server from intentional or accidental damage to unauthorized access, which is implemented by either hardware or software.
The proxy cache is located on a machine on the path from multiple clients to multiple servers. All clients within a given subnet can use the same proxy. This makes it efficient for a proxy to do caching of web objects requested by a number of clients. When a single proxy cache communications solely with its clients and servers, it is called an isolated cache.
It is possible to use a set of caching proxies that cooperate with each other to improve performance. They are called cooperative caches. The configuration may be hierarchical so that the caches can be identified as first caches, second level caches, and so on. It may be also non-hierarchical.
Proxy caches can be implemented either as explicit or transparent proxies. Explicit proxies require the user to configure their browser to send HTTP GET requests to the proxy.
Transparent proxies do not require explicit browser configuration; instead, a network element (switch or router) in the connection path intercepts requests to TCP port 80 (the standard HTTP port) and redirects that traffic to the cache. The cache then determines if the object exists and, if so, whether the object is already in the cache.

Because cached objects can change on the origin server without informing the cache, a cache must also determine whether each object it serves is fresh. This freshness decision is typically based upon the object’s last modification time and the time of last retrieval or validation. If the object is fresh, the cache serves it directly; if not, the cache validates the object with its origin server. The validation returns a current copy of the object if it has been modified, or a status code indicating the object has not changed.

A caching proxy has a difficult job. First, its arrival traffic is the union of the URL requests of many clients. For a caching proxy to have a cache hit, either the same user must request the same document two or more times, or two different users must request the same document. Second, a caching proxy often functions as a second (or higher) level cache, getting only the misses left over from Web clients that use a per-client cache (e.g., Mosaic and Netscape). The misses passed to the proxy-server from the client usually do not contain a document requested twice by the same user. The caching proxy is therefore, left to cache documents requested by two or more users. This reduces the fraction of requests that the proxy can satisfy from its cache, known as the hit rate.

Significantly, proxy servers play the key roles between users and web sites in lessening of the response time of user requests and saving of network bandwidth. Therefore, for achieving better response time, an efficient caching approach should be built in a proxy server. The proxy caching is widely utilized by computer network administrators, technology providers, and businesses to reduce user delays and to alleviate Internet congestion. CITATION CKu08 l 1033 (8)
C. Server cache
Server caching is another term for placing a cache in front of a web server. This is called “server” caching because it is implemented by the administrators of the web servers, rather than by the clients. The goal here is to cache and distribute web objects from the servers and to offload the processing of client requests.
Even at the origin server, web pages can be stored in a server-side cache for reducing the need for redundant computations or database retrievals. Thus, the server load can be reduced if the origin server cache is employed. 2.3 Web Caching ArchitecturesA web caching infrastructure is built using groups of HTTP proxy servers that share cached objects. To make cache cooperate on a large scale and effectively increase the cache population, several caches are usually associated in caching architecture. On the Internet, there are many caches with no communication between them these are isolated caches. The performance of a web cache system depends on the size of its community. Bigger the user community, the hit rate increases with the number of clients connected to the caches. Thus, to increase the hit rate a simple solution is to make all these isolated caches cooperate between them. 2.3.1 Hierarchical caching architectureOne approach to coordinate caches in the same system is to set up a caching hierarchy. With hierarchical caching, caches are placed at multiple levels of the network. For the sake of simplicity, we assume that there are four levels of caches: bottom, institutional, regional, and national levels CITATION Rod01 l 1033 (9). At the bottom level of the hierarchy, there are the client/browser caches. When a request is not satisfied by the client cache, the request is redirected to the institutional cache. If the document is not found at the institutional level, the request is then forwarded to the regional level cache, which in turn forwards unsatisfied requests to the national level cache. If the document is not found at any cache level, the national level cache contacts directly the original server. When the document is found, either at a cache or at the original server, it travels down the hierarchy, leaving a copy at each of the intermediate caches along its path. Further requests for the same document travel up the caching hierarchy until the document is hit at some cache level.
A hierarchical architecture is more bandwidth efficient, particularly when some cooperating cache servers do not have high-speed connectivity. In such a structure, popular Web pages can be efficiently diffused towards the demand.
2.3.2 Distributed Caching Architecture
In distributed Web caching systems, there are no other intermediate cache levels than the institutional caches, which serve each other’s’ misses. In order to decide from which institutional cache to retrieve a miss document, all institutional caches keep meta-data information about the content of every other institutional cache. With distributed caching, most of the traffic flows through low network levels, which are less congested. In addition, distributed caching allows better load sharing and are more fault tolerant.
Nevertheless, a large-scale deployment of distributed caching may encounter several problems such as high connection times, higher bandwidth usage, administrative issues, etc. CITATION Nag04 l 1033 (10)
There are several approaches to the distributed caching. Internet Cache Protocol (ICP), which supports discovery and retrieval of documents from neighboring caches as well as parent caches. Another approach to distributed caching is the Cache Array Routing protocol (CARP), which divides the URL-space among an array of loosely coupled caches and lets each cache store only the documents whose URL are hashed to it. CITATION Tiw10 l 1033 (11)2.3.3 Hybrid Caching
A hybrid cache scheme is any scheme that combines the benefits of both hierarchical and distributed caching architectures. Caches at the same level can cooperate as well as with higher-level caches using the concept of distributed caching CITATION Jin08 l 1033 (12)In a hybrid caching scheme a certain number of caches cooperate at every network level of the caching hierarchy. Here, the influence of the number of cooperating caches at every level k, on the total retrieval latency is considered. It is found that there is an optimum number of cooperating caches at every cache level that minimizes the total latency. This optimum number of cooperating caches depends on the network congestion, the parent cache congestion and the size of the document.

In hybrid caching when an item cannot be found within the cache, the cache first checks if any of the cooperating caches within the same network level has the copy of the item requested using the notion of distributed caching. If multiple caches happen to have the item copy, the cache with the lowest latency is selected. However, if the item is not found in any of the cooperating caches, the request is forwarded to the cache in the higher network level, using hierarchical caching.

The main performance measure is the expected latency to retrieve a Web document. It is debatable that which caching architecture can achieve the optimal performance. CITATION Rod01 l 1033 (9)Shows that hierarchical caching has shorter connection times than distributed caching, and hence, placing additional copies at intermediate levels reduces the retrieval latency for small documents.

It has also shown that distributed caching has shorter transmission times and higher bandwidth usage than hierarchical caching. A “well configured” hybrid scheme can combine the advantages of both hierarchical and distributed caching, reducing the connection time as well as the transmission time.

2.4. Cache Performance Analysis2.4.1. Common Performance MetricsSeveral metrics are used to assess the performance of a Web caching system.
A. Hit ratio
It is the number of requests that hit in the proxy cache as a percentage of total requests. When the cache size is small and the sizes of documents are similar, it will be a good measure. Higher hit ratio indicates better caching policy. However, it may only be relevant if the objects are homogenous. It also represents the percentage of all requests being serviced by a cache copy of the requested object, instead of contacting the original object’s server.
In contrast to a cache hit, a cache miss is a failure to find the required data within the cache. In such case, the item is obtained from the original web servers. The higher the hit ratio, the better the replacement policy is CITATION AJa10 l 1033 (13)B. Byte hit ratio
It is the number of bytes that hit in the proxy cache as the percentage of the total number of bytes requested. When the cache size is big and the sizes of documents are different, it will be a good measure. In case of objects being of heterogeneous sizes, Byte hit ratio is better metric for measurement.
It also represents the percentage of all data transferred from cache, i.e. corresponds to ratio of the size of objects retrieved from the cache server. Byte hit ratio provides an indication of the network bandwidth.

C. Bandwidth utilization
It is an important count where an algorithm that reduces consumption of bandwidth is better CITATION NBe16 l 1033 (14)D. User response time
It is the amount of time a user waits for the system to retrieve the requested object CITATION NBe16 l 1033 (14). It is also known as latency.
E. Cache server CPU and I/O utilization
The fraction of total available CPU cycles or disk and object retrieval latency CITATION NBe16 l 1033 (14). Latency is inversely proportional to object hit ratio because a cache hit can be provided more quickly than a request to origin server. However, optimizing one metric does not imply that it will optimize other too. For example, increasing hit rate does not essentially minimize latency CITATION NBe16 l 1033 (14).
2.4.2 Performance factors
A. User access patterns
If a user accesses objects small more frequently, then the objects that are small stand obvious candidates for caching. User access patterns are not static and therefore a good caching algorithm should not be static either CITATION res15 l 1033 (15). Cache replacement algorithms decide what objects to be discarded when the cache is full.
B. Cache removal period
Cache removal period dictates that documents will be removed when there is no space in the cache CITATION res15 l 1033 (15)Continuous cache removal period implies that cache has no space to hold the object currently being accessed. Fixed cache removal period means that objects will be removed only at the beginning of removal period CITATION NBe16 l 1033 (14)C. Cache size
Larger cache size implies that the cache can store more objects which means increased the hit ratio. Cache size is however expensive in terms of processing cost and complexity CITATION Dha12 l 1033 (16). Therefore, cache size is therefore a trade-off between cache cost and cache performance. In a small cache, either a caching mechanism may store many small sized objects or few large size objects. Maximum cacheable object size is a user-defined factor that puts a limit on the size of objects that can be stored in the cache CITATION Wal09 l 1033 (5)D. Cooperation
Coordination between user requests and many proxy caches in a hierarchal proxy cache environment.
E. Consistency
Consistency indicates maintaining updated replicas of objects in the cache. It means when the cache a client send requests for data to proxy server that data should be up-to-date. CITATION Ali11 l 1033 (17)2.5 Replacement Polices2.5.1 Why we research Replacement Polices
A cache server has a fixed amount of storage for storing objects. When this storage space is full, the cache must remove some objects in order to make room for newly requested objects. The cache replacement policy determines which objects should be removed from the cache. The key aspect of the effectiveness of proxy caches is a document placement/replacement algorithm that can optimize two common performance metrics hit rates and byte hit rates.
A replacement policy is increasingly becoming one of the fundamental components of the caching mechanisms to act as a key technology for high-speed multimedia services delivery. The replacement policy acts as a decision rule for evicting a page currently in the cache to make room for a new page in case of cache saturation, thus, determining when and what to evict from the cache CITATION Sur09 l 1033 (18) Thus, cache replacement algorithms are also called web caching algorithms.
The goal of replacement policy is to make the best use of available resources such as disk, memory space and network bandwidth. Since web, use is the dominant cause of network backbone traffic today, the choice of cache replacement polices can have a significant impact on global network traffic.
The problem of document replacement in web caches has received much attention in recent research, and it has been shown that the removal rule “replace the least recently used document” performs poorly in web caches. Instead, it has been shown that using a combination of several criteria, such as the recentness and frequency of use, the size, and the cost of fetching a document, leads to a sizeable improvement in hit rate (in case of homogeneous documents) as well as byte hit rate (in case of heterogeneous documents) and latency reduction.
Cache replacement is however the heart of web caching. The design of efficient cache replacement algorithms is required to achieve highly sophisticated caching mechanism. As cache size is limited, a cache replacement policy will determine which object is to be evicted to allow new objects and maximum utilization of the cache space efficiently.
An efficient cache replacement algorithm is used to eliminate “cache pollution?. Cache pollution means that the cache contains objects that are not frequently used in the near future which eventually lead to inefficient use of cache space. There are two kinds of cache pollutions. The term “cold cache pollution” refers to unpopular objects that remain in the cache for a long time whereas “hot cache pollution” refers to objects that stay in cache for long those were once but are no longer popular.

The efficiency of the replacement policy affects both the hit rate and the access latency of a cache system. Replacement policy affects the overall performance of the cache, not only in terms of hits and misses, but also in terms of bandwidth utilization and response time.

Most web browsers are still using traditional replacement policies, which are not efficient in web caching CITATION Kat12 l 1033 (19). In fact, there are few important factors of web objects that can influence the replacement policy CITATION V4I14 l 1033 (1). These factors include but not restricted to recency (i.e., time of the last reference to the object), frequency (i.e., number of the previous requests to the object), size, and access latency of the web object. These factors can be incorporated into the replacement decision.
Most of the proposed approaches in the literature use one or more of these factors without paying attention of combining some of these factors. However, combination of these factors is still a challenging task as one factor in a particular environment may be more important than others in other environments. CITATION HTC08 l 1033 (20)
A poor replacement policy can increase the number of misses in the cache, causes a lot of traffic out of the cache and increase the cache miss penalty. Because of these reasons, a lot of research, both in academia and industry, is geared toward finding the best cache replacement policy.

2.5.2 Factors to be considered Many factors should be considered when the cache has to take a replacement decision CITATION ElA l 1033 (21) A. Live Documents
We say a document is live if that document will be requested in the future. The cache only needs to retain live documents to achieve the maximum hit rate. Live documents are a small fraction of all documents. Thus, it is more appropriate to consider documents to be dealt if they have not been requested for more than some reasonably large time.

B. Interaccess time
Inter access time is the time between successive document requests. Documents having lower inter access times are the documents that are more likely to be requested in the future. Due to always selecting the document with the largest Interaccess time to be evicted, the LRU algorithm is the best replacement algorithm for reducing average cached-object interaccess time.
C. Number of Previous Accesses
Using the number of previous accesses made to a document is a good indication. We can use it to evaluate whether the document will be requested in the future. However, since it does not include any aging information about the document, this cannot be used alone as the deciding factor.

D. Document Size
The document size is another important factor for caching. In proxy, caching the cached documents can be of different sizes. Having more documents in the cache will likely lead to a higher hit ratio, so one might choose to cache more small documents at the expense of performance for larger documents.

E. Type of Document
The type of the document can also be an important factor to consider. Actually, many of the requested objects are rather small image files, suggesting that a bias for document type could be beneficial.

F. Latency
It is also important to consider the cost incurred in acquiring the document. The more expensive the document to download, the better it is to retain the document in the cache because the penalty for a cache miss is greater.

2.5.3 Existing Replacement Algorithms Existing replacement algorithms are classified into three categories, according to whether they exploit access recency and access frequency, and whether they are sensitive to the variable cost and size of objects. The classifications are Recency-based polices, Frequency-based polices, Recency/Frequency -based polices (with Fixed CostFixed Size Algorithms and Variable Cost /Size Algorithms).

Random:
The Random policy selects a page to replace from the existing pages in the buffer pool randomly (as the name suggests). As such, no criteria are used to select a victim
FIFO
The FIFO (First-In-First-Out) policy replaces the earliest cached page in the buffer pool. FIFO’s weakness lies in not being able to handle scan sequences where the number of unique pages being accessed in the scan are greater that the size of the buffer pool. In this case, the FIFO policy would need to read the pages from disk for each request.

LRU(Least Recently Used):
LRU policy is one of the most popular policies. It evicts the least recently referenced object first. This is particularly popular because of its simplicity and fairly good performance in many situations. It is designed on the assumption that a recently referenced document will be referenced again in the near future. LRU threshold is needed for estimating the expected time needed to fill or completely replace the cache content. This threshold is dynamically evaluated based on current cache size. One of the disadvantages of the LRU is that it only considers the time of the last reference and it has no indication of the number of references for a certain Web Object CITATION bol12 l 1033 (22)LRU-MIN algorithm CITATION ElA l 1033 (21) is much similar to LRU. It maintains a sorted list of cached documents based on the time document was last used.
The difference between LRU and LRU-MIN is the method of selecting the candidate for replacement. When the cache needs to replace a document it searches from the tail of the sorted list. The first document whose size is larger than or equal to the size of the new document is removed.

LFU(Least Frequently Used):
Since a network proxy is required to serve thousands of requests per second, the overhead needed to do so should be kept to a minimum. To do so, the network proxy should evict only resources that are not frequently used. Hence, the frequently used resources should be kept at the expense of the not frequently used ones since the former have proved themselves to be useful over a period of time. Every user of that page always requests static resources of heavily used pages. Hence, the LFU cache replacement strategy can be employed by these caching proxies to evict the least frequently used items in its cache CITATION Ali111 l 1033 (23). The standard characteristics of LFU method involve that the system is keeping track of the number of times a block is referenced in memory. When the cache is full and requires more room, the system will purge the item with the lowest reference frequency.
The major disadvantage of the LFU replacement algorithm is that the website keeps its place in cache memory for a long time even without using it again. This leads to wasting a certain size of cache memory since this element stayed in the memory with no change. Other disadvantages of LFU policies are that they require logarithmic implementation complexity in cache size, and they almost pay no attention to recent history.
The Least Recently/Frequently Used (LRFU)
LRFU (Least Recently / Frequently Used) replacement policy subsumes LRU and LFU replacement strategies. This policy takes the advantage of LFU and LRU replacement schemes.
The Least Recently/Frequently Used (LRFU) policy is a new block replacement policy that includes both the LRU and LFU policies, depending on the different weights given to recency and frequency. The LRFU policy associates a value with each object. This value is called the CRF (Combined Recency and Frequency) value and quantifies the likelihood that the object will be referenced in the near future.
Most Recently Used (MRU) page replacement policy:
The MRU (Most Recently Used) policy removes the page that has been accessed most recently from the buffer pool. MRU, like some of the policies already mentioned, is also susceptible to greater-than-buffer-size scans as well.

MFU (Most Frequently Used )
MFU considers only the frequency of references. In MFU, the blocks that have been referenced more frequently are the candidates for replacement. However, since it does not consider the recency factor, it cannot differentiate between blocks that were once hot but now becoming cold and blocks that are currently hot.

MF-LRU (Most Frequently-Least Recently Used) Replacement Policy
Unlike the LRU or MFU policies that consider either recency or frequency only, the MFLRU policy takes both of them into account in the replacement decision. This policy is a blend of two popular Replacement policies namely LRU (Least Recently Used) and MFU (Most Frequently Used) Replacement Policies.
The MF-LRU associates a value with each block called the RFF (Recency Frequency Factor), which quantifies the importance of that block i.e. the likelihood that the block be referenced in the near future. Thus, the block with the least RFF is the victim for replacement. The main objective of MF-LRU policy is to replace an old frequency dense block.

SIZE:
The object size is used as the primary factor CITATION Ali111 l 1033 (23), and this usually remove larger objects first. The size-based policy sorts cached documents by size. Documents with the same size are sorted by recency. When there is insufficient space for caching the most recently requested document, the least recently used document with the largest size is replaced CITATION Ali111 l 1033 (23).This strategy tries to minimize the miss ratio by replacing one large object rather than many smaller ones. It causes high cache hit ratio. It also has low byte hit rate.

GD-SIZE:
Greedy-Dual-Size (GDS) is an extension of SIZE policy. The algorithm combines several factors and assigns a key value for each object stored in cache. GD size deals with variable size documents by setting H to cost/size where cost is the cost of fetching the document while size is the size of the document in bytes, resulting in the Greedy-Dual-Size (GDS) algorithm. If the cost function for each document is set uniformly to one, larger documents have a smaller initial H value than smaller ones, and are likely to be replaced if they are not referenced again in the near future CITATION ElA l 1033 (21)GDS algorithm was originally developed in the context of disk paging and, later on, was adapted to web caching. On the web caching version, its goal is to reduce the aggregate latency where size and location of data can vary widely CITATION Ali111 l 1033 (23)GDSF:
Greedy-Dual-Size-Frequency (GDSF) extends GDS by containing the access frequency aspect in assigning key value. However, GDSF does not predict future accesses.

2.6 Studies Based on Web CachingVery few studies exist on the caching algorithms for the World Wide Web, mostly because of its recent introduction. The research community has studied web caching with many different angles over the years. Many researches have been conducted in the area of web proxy caching. While cache placement has not been well studied, a number of cache replacement algorithms have been proposed in recent studies, which attempt to minimize various cost metrics, such as hit rate, byte hit rate, average latency, and total cost.
Research into Web cache replacement policies is very active. A poor replacement policy can increase the number of misses in the cache cause a lot of traffic out of the cache and increase the cache miss penalty. Because of these reasons, a lot of research, both in academia and industry, is geared toward finding the best cache replacement policy.
Due to the tremendous growth of the World-Wide Web has motivated many research efforts aimed at improving web performance and scalability. In this section, we focus on web proxy research.
Ghosh et al. CITATION Sir l 1033 (24)Presented a very detailed survey on different Web caching and prefetching policies towards Web latency reduction. They also studied various Web cache replacement policies influenced by the factors recency, frequency, size, and cost of fetching object and access latency of object. They also concluded that Hit Ratio (HR), Byte Hit Ratio (BHR) and Latency Saving Ratio (LSR) are the most widely used metrics in evaluating the performance of Web Caching.
According to CITATION Chu10 l 1033 (25), replacement decisions in web caching can be affected by various factors like recency which measures the time since the last reference to the object, Frequency of requests to an object, Size of the web objects, Cost of fetching the object and access latency of object.
CITATION ABa04 l 1033 (26)Give an overview of various replacement algorithms. They conclude that GDSF outperform when cache size is small.

CITATION Hyb13 l 1033 (27) Suggested that it would be much better to replace documents based on the size as this maximizes the hit ratio in each of their workloads. As SIZE policy is replaced the largest object(s) from a cache when space is needed for a new object. Thus, a cache can be polluted with small objects that will not be accessed again. He also discussed that SIZE outperforms than LFU, LRU and several LRU variations in terms of different performance measures; cache hit ratio and byte hit ratio. In their experiments, they fail to consider object frequency in decision-making process.

CITATION CKu08 l 1033 (8)Proposed a replacement policy called Least Recently/Frequently Used (LRFU), which subsumes both the LRU and LFU, and provides a spectrum of block replacement policies between them according to how much more they weigh the recent history than the older history.
CITATION Dil11 l 1033 (4)proposed Hybrid cache replacement algorithm make use of a combination of multiple requirements such as maintaining in the cache documents from servers that take significant time to connect to, those that need to be fetched from the slowest links, those that have been accessed very frequently, and those that are small. Wooster and Abrams checked the performance of Hybrid algorithm alongside LRU, LFU and SIZE. Hybrid algorithm performed well when compared with traditional LRU, LFU and SIZE replacement algorithms.

CITATION TRG10 l 1033 (3) Presents a dynamic pre-fetching technique implemented at proxy server in which Web caching and prefetching techniques are integrated. Using the technique cache hit ratio is increased to 40%-75% and subsequently latency is reduced to 20%-63%.

(M. N. Wijesundara, 2002) proposes a feature in distributed web caching concept where each client acts as a cache server and shares its contents with the neighboring node. Here Hit rate and Miss rate are considered but no discussion on cache size and latency time is done.
(Tiwari, R. and N. Kumar,, 2012a), a hybrid web caching algorithm has been proposed to overcome the scaling and reliability problems of the World Wide Web (WWW) such as congested proxy servers.
The algorithm uses hierarchical web caching approaches along with a dynamic mechanism of proxy servers. This mechanism utilizes the bandwidth because most of requests are satisfied from local sites.

CITATION Ali111 l 1033 (23) Presented taxonomy of cache retrieval policies, by means of trace-driven simulations they measure the maximum feasible hit ratio and byte hit ratio. They suggested that it would be much better to replace documents based on the size as this maximizes the hit ratio in each of their workloads.

At present, the web cache memory is used to solve the traffic network problem. It can keep web objects from original servers and clients can download web objects from the web cache memory. However, the web cache memory has a small memory therefore; it cannot record every web object. Many techniques are used to solve this problem but these techniques give the average hit rate of not over 40 percent.
Unfortunately, the cache hit ratio is not improved much with caching schemes. Despite with a cache of infinite size, the hit ratio are still limited only at the range from 40% to about 50%, regardless of the caching scheme. CITATION Hyb13 l 1033 (27) This is because most people browse and explore the new web pages trying to find new information.
The most commonly evaluation technique used in web-proxy cache evaluation studies is the trace-driven simulation using a captured log as a workload.
Other researchers have focused on managing proxy cache contents in order to improve hit rates (as well as other metrics). Currently there are two approaches to cache management. One approach attempts to use a few resources as possible by making good replacement decisions when the cache is full.
The alternative approach is to provide the cache abundant resources so that few replacement decisions need to be made. In this paper, we focus on the first approach. Thus, throughout the remainder of this paper, we focus on maximizing (enhancing) either the hit rate or the byte hit rate or at the same time of a proxy cache that has a limited amount of cache space.
As it can be observed, many Web cache replacement policies have been proposed for improving performance of Web caching. However, it is difficult to have an omnipotent policy that performs well in all environments or for all time because each policy has different design rational to optimize different resources. Moreover, combination of the factors that can influence the replacement process to get wise replacement decision is not an easy task because one factor in a particular situation or environment is more important than other environments. Hence, there is a need for a combined approach to efficiently manage the Web cache, which satisfies the objectives of Web caching requirement.
In general, cache replacement algorithms attempt to maximize the hit ratio (the percentage of requests successfully fulfilled by the cache) by holding onto the items most likely to be requested in the future. Unfortunately, recent results suggest that the maximum cache hit rate that can be achieved by any caching algorithm is usually no more than 40% to 50%.
As it can be observed, many replacement techniques are proposed by many literatures. However, most of the research not focused on the optimization of the cache hit rate and byte hit rate to improve the performance of web caching, even though, some of the research proposed a hybrid scheme the percentage ratio is not efficient still now and one is negatively affect the others. The proposed schemes mostly combines the different replacement polices, however, the factor is single factor and not consider the hit ratio criteria.
This is the motivation in adopting our Hybrid Cache Replacement Policy in solving Web caching problem. In order to optimize the hit rate and byte hit rate using virtual concept simultaneously, this research paper proposed an efficient proxy caching techniques to improve web caching performance to overcome these limitations.
CHAPTER THREE
3. RESEARCH METHODOLOGY3.1. Raw Data CollectionWith a specific end goal to describe the workload of a web proxy and to direct a trace driven simulation of a web proxy cache, estimations of an actual web proxy workload were gathered.
Web access logs are crucial in studying web workloads and testing the performance of replacement policies. An access log contains an entry for each request received by the server. Web proxy logs files provide information about activities performed by the users when the user logs onto the servers. Each entry in the access logs records the URL of the document being requested (dynamically assigned), the date and time of the request, the name (or the IP address) of the client host making the request, the number of bytes returned to the requesting client, and additional information that describe how the client’s request was treated at the proxy. Processing these log entries can produce useful summary statistics about workload volume, document types and sizes, popularity of documents, and proxy cache performance.
The access logs of 45 days collected, that contain information on all client requests for web objects from February 2 to March 24, 2018. The proxy server recorded 20,725,429 requests in 45 days of activity. However, the access logs for nine (9) days were not available, due to downtime for server upgrades and network outages. Therefore, from February 2 to March 24, 2018, the access logs for six (6) days were not available. Each entry includes the client IP address (dynamically assigned), the time of the request, the requested URL, the status codes for both the proxy and origin server responses, the size of the response ( in bytes) and the time required to complete the response.
3.2. Data Pre-processing
Because of the greatly huge access logs made as a substitute, the researcher found that it important to make littler, more minimized log because of capacity limitations and to guarantee that the workload investigation and caching simulations could be finished in a sensible measure of time. The researcher performed these reductions in several ways while still maintaining as much of the original information as possible.
In this paper, the data pre-processing involves two steps: trace preparation and dataset preparation.

In the trace preparation, insignificant or not legitimate requests are removed from the log records, for example, the uncacheable requests (i.e., inquiries with a question mark in the URLs and cgi-receptacle solicitations) and sections with unsuccessful HTTP status codes that are expelled from the log records.
According to the HTTP specification, responses with a status code of 200 (Successful), 203 (Non-authoritative Information), 300 (Multiple Choices), 301 (Moved permanently) and 410 (Gone) (except for dynamic requests) are considered to be cacheable.
On other hand, in order to reduce the simulation time, each URL is replaced with an integer identifier that is called “URL ID”. In addition, the client Address is replaced with an integer identifier.
The important features of Web objects that
Indicate interest users are extracted for preparing training dataset. These features
Consist of URL ID, timestamp, elapsed time, size and type of
The important features of Web objects that
Indicate interest users are extracted for preparing training dataset. These features
consist of URL ID, timestamp, elapsed time, size and type of
In order to prepare the dataset of Web objects, the desired features of Web objects are extracted from trace and logs files.
The essential highlights of Web objects that indicate interest client are extracted for getting training preparing dataset. These features consist of URL ID, timestamp, elapsed time, size and kind of web object.
3.3. Simulator Description
The performance of replacement polices is usually assessed in the literature review by running simulations driven by actual HTTP log traces. Because of the to a great degree substantial informational collection that the researcher use in this simulation study, it was important to actualize the test system as effectively as could reasonably be expected. CITATION EDW02 l 1033 (28)In order to reduce the time required to update the linked list following a cache hit or miss the researcher replaced the linked list data structure with a heap. 3.4 Assumptions
Inappropriately, not all data of interest is accessible in the access logs. One issue that we challenged was attempting to effectively recognize object changes and client terminated connections. To address this issue we acknowledged that alterations and halts could be distinguished by an adjustment in the extent of the object. It is expected that if no adjustment in the size happens, at that point the document have not been changed. In the event that a record is altered, it is expected that it will result in a little change in the size. An aborted ended demand is recognized if the quantity of bytes exchanged is not as much as the present gauge of the asked for document’s size. In spite of the fact that these presumptions are not in every case genuine, they take into consideration getting a sensible gauge.

3.5. Experimental Design
This section describes the design of the web proxy cache simulation study. Under this section, it introduces the factors and levels that are examined. In addition, it presents the metrics used to evaluate the performance of each replacement policy.
3.5.1. Cache SizesThe first factor that investigates in this study is the cache size. As a rule, the larger the cache, the higher the performance of the replacement policy will be. However, in spite of the falling prices of storage, the widening availability of broadband connections and as a consequence, larger files, will no doubt offset these gains. Because of this, the way that replacement policies perform on relatively small cache sizes will remain an important factor.

The cache size indicates the amount of space available for sorting web objects. In this study, the researcher examine seven different levels for this factor: 2MB, 8MB, 32MB, 128MB, 512MB, 2048MB and 8192MB. Each level is a factor of four larger than the previous size; this allows us to easily compare the performance improvement relative to the increase in cache size. The smaller cache sizes (e.g. 2MB to 128MB) indicate likely cache sizes for web proxies. The larger values (512MB to 8192MB) indicate the performance of the cache when a significant fraction of the total requested object set is cached.
The largest cache size (8192MB) can store the entire object set and thus indicates the maximum available performance of the cache. The other cache sizes can hold approximately 0.06 %( 2 MB), 0.25% (8MB), 1% (32MB), 4 %( 128MB), 16 %( 512MB) and 64 %( 2048MB) of the entire object set.
The cache size levels in the experiment set to 100% of the object set sizes. Since it shows the highest performance that can be achieved by any policy, that is, the performance with an infinite cache size.
3.5.2. Cache Replacement Policies
The second factor that the researcher investigates in the simulation study is the replacement policy used by the cache. The replacement polices selected for the experimentation require only the unique ID of the object being requested and its size in bytes. There are many proposed cache replacement policy, too many to focus on in this study. The researcher examines eight (8) different, previously proposed replacement in this study. These polices are, LRU, LFU, LFU-Aging, LFU-DA, GD-Size, GD_Size (P), Size and GDSF. The researcher chooses those replacements polices because each one considers at least one of the proxy workload characteristics when making a replacement decision. 3.5.3. Performance metricsIn this study, two metrics used to evaluate the performance of the proxy cache: hit rate and byte hit rate. Both metrics have been consistently used in assessments of Web caching in the research literature. Since HR and BHR are the most widely used metrics for evaluating, the performance of web proxy caching polices CITATION Wal09 l 1033 (5)Considering the trade off between hit rate and byte hit rate, a proxy cache that is primarily intended to reduce response times for users should utilize a replacement that achieves high hit rates. In an environment where saving bandwidth on the shared external network is of utmost importance, the proxy cache should use a replacement policy that achieves high byte hit rates. A proxy cache could also utilize multiple replacement policies. For example, a replacement policy that achieves high hit rates could be used to manage the proxy’s memory cache in order to serve as many requests as quickly as possible and to avoid a disk I/O bottleneck. The proxy’s much larger disk cache could be managed with a policy that achieves higher byte hit rates, in order to reduce external network traffic.

3.6 Virtual Cache Experiments
There may not be a single policy that maximizes the performance of both hit rate and byte hit rate. A drawback of having two policies is that a decision must be made on which metric is most important, and therefore, which replacement policy should be used. In some situations, the choice may be obvious; in others both, high hit rates and high byte-hit rates may be important.

The study developed an approach that can focus on both of these metrics simultaneously. This approach logically partitions the cache into N virtual caches. Each virtual cache (VC) is then managed with its own replacement policy. Initially, all objects are added to VC0. Replacements from VCi are moved to VCi+i. Replacements from VCn-i are evicted from the cache. All objects that are re-assessed while in the cache (i.e., cache hits) are reinserted in VC0. This allows in-demand objects to stay in the cache for a longer period. For example, to achieve high hit rates and high byte hit rates simultaneously, two VCs are used. One VC focuses on obtaining high hit rates using a replacement policy such as GD-Size and GDSF; the other VC aims to achieve high byte hit rates by utilizing a replacement policy such as LRU and LFU-DA. This paper simulated this cache management policy to evaluate its effectiveness when utilizing two VCs. The first experiment used the GDSF policy to manage VC0 (this refer to as the Hits VC) while VC1 (the Bytes VC) employed the LFU-DA policy. In the experiments, a setup with two virtual caches (VC0 and VC1) will be used.
The variables in the experiments will be:
The cache size
The size of VC0 and VC1 as a percentage of the total cache size: these will be represented as a pair of values A-B, where A is the percentage of the size of VC0 and B is the percentage of the size of VC1.

The study found that the use of the VC management approach does in fact improve the performance of the cache across multiple metrics. Both the hit rate and byte hit rate of the VC management approach appear to be bounded by those of the policies used to manage each of the individual VCs.
As the percentage of space dedicated to the Hits VC decreases, the hit rate also decreases although it remains higher than the policy used to manage the Bytes partition. At the same time the byte hit rate increases, nearing that achieved by the policy used to manage the Bytes partition. These improvements can be explained from the characteristics of the proxy workload. Due to the non-uniform popularity distribution, a cache is able to achieve relatively high hit rates with a very small amount of storage space.
However, as the cache gets larger, the rate of improvement in hit rate declines as it becomes increasingly difficult to identify objects that will continue to improve the hit rate (the diminishing returns effect). A similar argument can be made regarding the objects that affect the byte-hit rate. In this study, by partitioning the cache most of the benefit of a cache that is dedicated to a single metric while making more effective use of the remaining space relative to a second metric.

CHAPTER FOUR4. RESULT ANALYSIS AND DISCUSSIONIn this section, we present preliminary results obtained from simulations of the selected replacement policy and simulations of the virtual caches approach, to assess the efficiency of the replacement policies in optimizations of the Hit Rate (HR) and Byte Hit Rate (BHR). First, we show the workload characterization, and then we present the results, and discuss their implications.

4.1 Workload Characterization
The characterization of proxy workloads is critical in the design of effective cache replacement polices. The characteristics considered to be of importance in cache replacement decisions are described next. In this section, the paper presents a summary of the workload characterization study. In particular, focus on the characteristics that could influence proxy performance and cache replacement decisions.
4.1.1. Summary of Workload Characteristics
As stated in section 3.2, from the 20,725,429 requests in 45 days of activity after data processing and reduction the following tables summarized the total request and other related information.
Table 4.1. Summarizes the reduced access logs for the proxy sites. Based on the average number of requests seen per day at proxy server.
Table 1 Summary of proxy access log characteristics (Reduced data set)Access Log Duration February 2,2018 up to March 24, 2018
Number of days 45
Total requests 7,310,038
Average requests per day 162,445
Total content data (GB)
(Total Bytes Transferred) 89
Average content data per day (MB)
(Average Bytes/Day) (MB) 2025
Unique Cacheable Requests 4,867,686
Unique Cacheable Content Bytes 69 GB
Total Uncacheable Requests 1,706,189
Total Uncacheable Content Bytes 8 GB
4.1.1.1 Cacheable/Uncacheable objects
In order for web caching to improve performance, it is vital that most objects be cacheable. In the experiments, a URL is considered to be cacheable if it does not contain substrings such as ‘cgi-bin’ or ‘?’, if it does not have a file extension such as ‘.cgi’, and if the origin server response contains the appropriate response (e.g., 200).
In this study all requests except for aborted (i.e., incomplete) transfers are used to drive the simulation. We divide the completed transfers into two (2) groups: cacheable requests and uncacheable requests. The cacheable requests are identified by the response status code recorded in the data set. Unfortunately, the data that collected did not include the headers that would indicate which of these responses were uncacheable.
The first step in pre-processing the logs is to determine which objects in the logs are cacheable, and therefore available to be included in the simulations. Figure 4.1 shows the percentages of cacheable and uncacheable requests bytes transferred in the logs. Based on the analysis of the data set under study revealed that 89% of all requested objects (94% of the data transferred) were cacheable. The ratio of cacheable/uncacheable bytes is even higher which also bodes well for reducing internal traffic as it relates to byte hit rate performance.

5734053263265Chart 1 Percentage of Cacheable/Uncacheable RequestsChart 1 Percentage of Cacheable/Uncacheable Requestscenter5715

5734053269615Chart 2 Percentage of Cacheable/Uncacheable Requests (Bytes)Chart 2 Percentage of Cacheable/Uncacheable Requests (Bytes)center12065
4.1.1.2 Object Set Size
From Table 1 we reported that there were over 4 million unique cacheable objects requested during the measurement period. Due to the extremely large objects set size, the proxy cache must be able to quickly determine whether a requested object is cached to reduce response latency. The proxy must also efficiently update its state on a cache hit, miss or replacement.
4.1.1.3 Object Size
One of the obstacles for web caching is working effectively with variable-sized objects. While most of the requested objects are small, (the median object size in this data set was 4KB) there are some extremely large objects available. The largest object requested during the measurement period was images (76.38%) and HTML files (20.58%). We take a chance that the higher access speeds available to the clients are increasing the number of large transfer as well as the maximum size of transfers. The issue for the proxy cache is to decide whether to cache a large number of small objects (which could potentially increase the hit rate) or to cache a few large objects (possibly increasing the byte hi rate).
4.1.1.4 Recency of Reference
Most web proxy caches in use today utilize the Least Recently Used (LRU) replacement policy (or some derivative of LRU). This policy works best when the access stream exhibits strong temporal locality or recency of reference (i.e. objects which have recently been referenced are likely to be re-referenced in the near future).
In this study, we found that one-third of all re-references to an object occurred within one hour of the previous reference to the same object. Approximately two-thirds (56%) of re-references occurred with 24 hours of the previous request. These results suggest that recency is a characteristic of web proxy workloads.
4.1.1.5 Frequency of reference
A frequency distribution on the number of accesses of each object in each workload will be generated to determine the degree in which objects are re-accessed. One-timer documents will receive special attention. Several studies, have found that some web objects are substantially more popular than others are. That is, web referencing patterns are non-uniform. An important aspect for caching regarding the frequency of reference is that of objects requested a single time; these objects are called “one-timers”. One-timers are important because there is no benefit in caching a one-timer. Cache performance could be improved if these files could be readily identified so that hey would not be stored in the cache.
4.1.1.6 Turnover
One final characteristics that could affect proxy cache replacement decisions is turnover in the active set of objects (i.e.; the set of objects that users are currently interested in). Over time, the active set changes; objects that were once popular are no longer requested. These inactive objects should be removed from the cache to make space available for new objects that are now in the active set.
4.1.2 Response Status Codes
HTTP response status codes are important especially to determine when a request results in file transfer. The most important status codes for the simulations (because they result in file transfer are ‘Successful’ (200), ‘Found’ (302), and ‘Not Modified’ (304). For example, a reply code of 200 (OK) means that a valid document was made available to the client, either directly from the proxy cache, or by retrieving the document from another proxy cache or from the origin server. A reply code of 304 (Not Modified) implies that a client had issued a GET If-Modified-Since request to determine whether or not the client’s cached copy of a document was up-to-date, and the server (or some intermediate proxy) replied indicating that the client has a valid copy of the document.
An HTTP response cod of 206 (Partial Content) implies partial transfer of the document to the client in response to a partial GET request. Partial GETs are used to recover efficiently from partial failed transfers in HTTP/1.1. An HTTP response of 302 (Found) means that the requested document is known to reside in a different location than that specified by the URL. An unsuccessful response status code means that either no such document exists, the client did not have permission to access this document, or an error occurred (at the server or during network communication).
Table 4. Response status codes
Status code Percentage
200 (Successful) 89.3%
304 (Not Modified) 7.7%
302 (Found) 2.1%
Unsuccessful 0.9%
Total 100%

4.1.3 Relationship between workload characterization and simulation results
Here will be presented the way in which the workload characterization can act as an aid in the process of predicting the behavior of replacement polices. The characteristics found most relevant to the simulations will be presented in the following sections.
4.1.3.1 Cacheable/Uncacheable objects and response status codes
The percentage of cacheable/uncacheable documents combined with the successful status codes do offer an upper limit on how well any replacement policy can perform.
4.1.3.2 Object types
Object types are determined by the file extension in the requested URL. The increasing use of dynamic content on the web is relevant to web caching. The distribution of object types in the log traces can aid in the selection of replacement policies by offering insight into the ways in which the hit rate and byte hit rate might behave. The next step in the analysis classifies document requests (based on URL extensions) into HTML, Image, Audio, Video, Compressed and Formatted. Note that all documents with “cgi_bin” or “?” in the URL strings are classified as Dynamic documents. The content type information available in the access logs is used only if URL extensions do not classify the documents into one of the above-specified generic types.
In addition, object (document) size is one of the most important factors in web caching. In fact, it is because of the variability in object sizes that there are two performance metrics (hit rate, and byte hit rate) and not one, and why it is difficult to optimize both metrics at the same time.
For example, for a log where a high percentage of the documents come from large documents, such as media (sound, video) files it should be easier to improve the byte hit rate by choosing policy that gives preference to larger files (such as LRU, Size, LFU-DA and GD-size(P)). On the other hand, a log where smaller files such as text and html are prevalent might make it easier to improve the hit rate by giving preference to a policy that chooses to cache smaller files, such as GD-Size, GDSF.
The following table—– shows the breakdown of document types and sizes in the proxy logs.
Table 2 Breakdown of Document Types and SizesItem HTML Image Audio Video Compressed Formatted Dynamic Others
No.of requests 20.58 76.38 0.44 0.12 0.10 0.56 1.64 0.19
No. of Bytes 17.86 49.01 5.74 12.36 5.12 7.89 0.24 1.78
From the table, we observed that the image type documents are responsible for the highest percentage of bytes transferred, followed by HTML documents. A substantial portion of the byte transfer volume is accounted for by video, audio, compressed, and formatted document types, which, despite their relatively infrequent access (typically close to 1% of the requests), are large enough to generate 20-30% of the bytes transferred.
4.1.3.3 One –Time Referencing
The following Table 5. Summarizes the contribution of one-timers with respect to the number of distinct documents, the total requests, and the total bytes transferred, for the selected data sets. One-timers account for a substantial portion of the total request (47%) and total bytes transferred (48%). In addition, approximately 70% of the documents referenced are one-timers. Likely, request made to a proxy are for a much larger population of web documents, compared to those made to a single server.
Table 3 One-Time Referencing Behavior in web proxy workloadsItem Distinct Documents 4,571,539
One-Timer Documents 3,470,920
One-Timer/Distinct Documents (%) 75.92
Total Requests 7,310,038
One-Timer Requests 3,470,920
One-Timer/Total Requests (%) 47.48
Total Bytes Transferred (GB) 89
Total One-Timer Bytes (GB) 43
One-Timer/Total Bytes (%) 48.3
4.2 Replacement polices simulation Results
This section describes the design of the web proxy cache simulation study. In sub-section, we present the factors and levels that are examined, the metrics used to evaluate the performance of each replacement policy and finally discussion on other issues regarding the simulation study.
As described in section 3.10. Cache sizes and replacement policy are the two key factors in the experiments.
In this section, the results of the replacement polices simulations are presented for both hit rate and byte hit rate. Table 4.4 shows the hit rate results for the selected proxy server logs. From the table as shown below LFU-DA is the best performer for small cache sizes.
Table 4 Proxy log hit rate performance ratingsHit Rate Rating
Cache sizes LFU-DA LRU GDSF GD-Size
Small 10.000 1.397 2.045 0.455
Medium 6.596 8.381 10.000 2.150
Large 1.259 8.832 10.000 6.401
Average 5.952 6.135 7.348 2.679
Table 5 Proxy log Byte hit rate performance ratingsByte Hit Rate Rating
Cache sizes LFU-DA LRU GDSF GD-Size
Small 10.000 0.000 1.127 1.632
Medium 10.000 0.000 7.399 7.215
Large 9.959 0.000 6.538 9.426
Average 9.987 0.000 4.221 6.001
4.2.1 Final results
Table 4.5 shows the results for the proxy logs and the hit rate and byte hi rate respectively. As the result indicated, LFU-DA is the best overall hit rate performer on small cache sizes, with GDSF as the best overall performer for this measure. LFU-DA is the best overall byte hit rate performer. Due to these results, GDSF and LFU-DA are selected as the polices to be used in the virtual cache experiments.
Table 6 Final hit rate performance ratingHit Rate Rating
Cache sizes LFU-DA LRU GDSF GD-Size
Small 10.000 2.469 3.027 0.918
Medium 4.266 6.947 9.965 2.958
Large 0.637 8.556 10.000 5.759
Average 4.968 5.756 7.735 2.837
Table 7 Final byte hit rate performance ratingByte Hit Rate Rating
Final Results
Cache sizes LFU-DA LRU GDSF GD-Size
Small 10.000 0.371 1.188 1.838
Medium 9.762 1.063 6.461 7.703
Large 8.081 1.339 7.340 7.713
Average 9.284 0.925 4.996 5.748
4.3 Virtual Cache simulation results
This section introduces an analysis of the results of the virtual cache simulation experiments. As described in the previous section, the two (2) polices used for the virtual cache simulations are GDSF for hit rate and LFU-DA for byte hit rate. Results are presented in a similar manner to the cache replacement policy simulations. The outcome of the experiments presented for the proxy server logs, and overall results.
Several points are regarding the performance of the virtual cache approach. These are:
The performance of the system as compared to the best performer polices GDSF and LFU-DA.

The relationship between the hit rate and byte hit rate performance of the virtual cache system and the percentage of space assigned to each virtual cache. In other words, whether hit rate and byte hit rate performance can be adjusted by the adjusting the percentage of space.
This section presents the results of the VC simulations for the selected proxy server logs. The results shows that , GDSF is the best hit rate performer, also the hit rate decreases as the amount of cache space dedicated to GDSF is reduced. In addition the results shows that the increasing the space assigned to LFU-DA does not always result in an increase in byte hit rate. In this case, the byte hit rate decreases for configurations where the space assigned to VC1 is more than 64%. GDSF and LFU-DA holds the best hit rate and byte hit rate respectively. In addition, as the cache space assigned to a given metric is increased, the performance for that metric also increases.
4.3.2 Virtual Cache Control of Hit Rate and Byte Hit RateThis section explores the level of control that the VC approach offers over the hit rate and byte hit rate. In the VC approach the hit rate as well as the byte hit rate performance has lower and upper bounds, that is, the lowest and highest performance that can be achieved by varying the cache space assignment between VCI and VC2.

There are two key questions: how good is the VC approach at achieving the highest possible rates, and how much does optimizing for one metric affect the other. The following sections detail about cache size categories for the proxy server logs.

4.3.2.1. Small cache sizes
Figures 2 show the results of the VC simulations with hit rate and byte hit rate side by side for LFU-DA, GDSF, and each of the VC cache size distributions. LFU-DA and GDSF are included as reference points. Ideally, for hit rate, the VC performance at each level should be better than the performance of LFU-DA, and conversely for the byte hit rate, the VC performance should be better than the performance of GDSF.

The effect of control of the VC approach is not expected to be very marked in small cache sizes. This is because for small cache sizes the best hit rate performer is LFU-DA and not GDSF. LFU-DA is the best performer for both metrics in proxy logs.

Chart 3 HR/BHR Performance Comparison (Small)4.3.2.2. Medium cache sizes
Figure 3. Show the results for medium cache sizes. As shown in the figure GDSF is the best hit rate performer and LFU-DA the best byte hit rate performer. Different VC configurations are competitive especially with GDSF.

The difference between GDSF and the best –performing hit rate VC configuration is 0.42% and for the byte hit rate and LFU-DA, 2.20%. The difference between the best and worst performance for the VC configurations is 6.00% and for the byte hit rate 4.3%.

Chart 4 HR/BHR performance comparison (Medium)4.3.2.3. Large cache sizes
Figure 4. Below show, that for the proxy log GDSF is the best performer but again by the smallest of margins (0.04%) over the best-performing VC configuration. LFU-DA is the best byte hit rate performer but also by a very small margin (0.19 %) over the best VC configuration. In this case, byte, hit rate performance does not peak at the VC configuration where the most amount of cache space is dedicated to LFU-DA. 1.08 % separates the best from the worst hit rate performing VC configurations. This difference is 0.35% for the byte hit rate.

Chart 5 HR/BHR Performance Comparison (Large) Chapter Five
5. Conclusion and Recommendation
5.1. Conclusion
The paper has presented a performance study of a web proxy cache based on the most significant proxy workload characterization to date. Trace-driven simulations were used to evaluate the performance of different cache replacement policy for proxy caches.
The workloads results indicate that size-based polices (GDSF) achieve higher hit rates than other polices. The paper also found that frequency based polices (LFU-DA) are more effective at improving the byte hit rate of a proxy cache. Furthermore, the paper developed virtual caches as an approach to provide optimal cache performance for multiple metrics simultaneously.
5.2 Recommendation
There are many open issues regarding web proxy performance and caching. Further work in this area may include finding better replacement polices that ones achieve higher byte hit rates. In addition, more implementation of efficient consistency mechanism and adding more functionality to web proxy caches (e.g. accounting and security). The other issues is finding a better simulation tool that achieves a better hit rate and byte hit rate by comparing the replacement polices with the concept of virtual caches.
References BIBLIOGRAPHY 1. http://www.ijarcsse.com/docs/papers/Volume_4/3_March2014/V4I3-0111.pdf. . Online
2. Cache Optimization on Hot-Point Proxy Caching Using Weighted-Rank Cache Replacement Policy. Ponnusamy, S. P., and E. Karthikeyan. 2013, ETRI Journal 35.

3. A Rank Based Replacement Policy for Multimedia Server Cache Using Zipf-Like Law. T. R. Gopalakrishnan Nair and P. Jayarekha. 3, 2010, Journal of Computing, Vol. 2.

4. An Explore View of Web Caching Techniques. Dilbag Singh, Sachin Kumar, and Soniya Kapoor. 2011, An Explore View of Web Caching Techniques, Vol. 1.

5. Integration of Least Recently Used Algorithm and Neuro-Fuzzy System into Client-side Web Caching. Waleed Ali and Siti Mariyam Shamsuddin. s.l. : IJCSS, 2009, International Journal of Computer Science and Security , Vols. 31-15.

6. Reducing Outgoing Traffic of Proxy Cache by Using Client-Cluster. Kyungbaek Kim, and Daeyeon Park. 8, 2006, Journal of Communications and Networks, pp. 330-338.

7. Design and Implementation of Server Side Web Proxy Caching Algorithm. K. Ramu and R. Sugumar. 1, 2012, International Journal of Advanced Research in Computer and Communication Engineering.

8. A new approach for a proxy-level web caching mechanism. C. Kumar, J.B. Norris,. 2008, Decision Support Systems.

9. Analysis of Web Caching Architectures:Hierarchical and Distributed Caching,. Rodriguez, P., Spanner, C., Biersack, and E. August 2001 , Vol. 9(4), pp. 404-418. IEEE/ACM Transactions on Networking.

10. web caching and its applications Caching Architectures. Nagaraj, S. V. New York : s.n., 2004, p. 24.

11. Load Balancing in Distributed Web Caching. Tiwari Rajeev and Khan Gulista. 2010. ,International Conference on Methods and models in science and technology. Vol. 324, pp. 341-345.

12. A New Hybrid Architecture for Cooperative Web Caching. Jinsuk Baek,Gurpreet Kaur and Junghoon Yang. MAY 2008 , journal of ubiquitous convergence and technology, Vol. 2. 1.

13. Highperformance cache replacement using re-reference interval prediction. A. Jaleel, K. B. Theobald, S. C. Steely Jr, and J. Emer.  : In ISCA-37, 2010.

14. Sanchez, N. Beckmann and D. Modeling cache performance beyond LRU. 2016.

15. /research/volume128/number10/nanda-2015-ijca-906656.pdf. Online 2015. https://www.ijcaonline.org.

16. CH”Study of the web caching Algorithms for Performance Improvement of the response speed”. Rao., Dhawaleswar. 2012, Indian Journal of Computer Science and Engineering, Vol. 3.

17. Web proxy cache content classification based on support vector machine. Ali, W; Shamsuddin, S.M, Ismail, A.S. 2011, Journal of Artifical Intelliegence 4(1), pp. 100-109.

18. Evaluation of Efficient Web Caching and Prefetching Technique for Improving the Proxy Server Performance. Suresh Babu Kuppusamy and S. K. Srivatsa. 2009, Global Journal of Computer Science and Technology, Vol. 9.

19. “Fixed Segmented LRU Cache Replacement Scheme with Selective Caching.” . Kathlene Morales and Byeong Kil Lee. 2012 . In Performance Computing and Communications Conference (IPCCC), 1st International. pp. 199-200. IEEE.

20. Chen, H.T. “Pre-fetching and Re-fetching in Web Caching systems: Algorithms and Simulation. Master Thesis. OntarioPeterborough : s.n., 2008.

21. A Quantitative Study of Web Cache Replacement Strategies Using Simulation. In: Web eplacement Strategies Using Simulation. ElAarag, H. s.l. : Romano (Eds.), Springer Briefs in Computer , 2013. , pp. 17-60. . ISBN: 978-1-4471-4892-0.

22. “Presenting a Novel Page Replacement Algorithm Based on LRU,”. bolfazl Akbari, Khosrozadeh, Ali, Sanaz Pashmforoush, Maryam Bagheri, and Neda Beikmahdavi. 2012, Journal of Basic and Applied Scientific Research.

23. “A Survey of Web caching and Prefetching,”. Ali Waleed, Siti Mariyam Shamsuddin, and Abdul Samad Ismail,. 2011, International Journal of Advances in Soft Computing Applications, Vol. 3, pp. 18-44.

24. Web Latency Reduction Techniques:. Jain”, SirshenduSekharGhosh and Dr. Aruna. IJITNAA Review Paper”, Vol. 1, pp. 20-26.

25. The LRU* WWW proxy cache document replacement algorithm. Chung-yi Chang,Tony McGregor,Geoffrey Holmes. Sep 17., 2010.

26. “An Overview of Web CachingA.Balamash, M.Krunz, “. A.Balamash, M.Krunz. s.l. : IEEE Communications Surveys & IEEE Communications Surveys , 2004, An Overview of Web Caching Replacement Algorithms”.

27. Hybrid Cache Replacement policy for proxy server. 3, March 2013, International Journal of Advanced Research in computer and communication engineering , Vol. 2.

28. AYBAR, EDWIN. Obtimizing both hit rate and byte hit rate in web cache replacment polices . A THESIS IN SOFTWARE ENGINEERING. 2002.

Appendix A Virtual Cache Simulations
//Cache.h: Interface for the CCache class
#include “Node.h”
#include “constants.h”
//#if !defined (AFX_VCACHE_H_13ACBB34_F3A2_4643_B3AF_816990670555_INCLUDED_)
//#define AFX_VCACHE_H_13ACBB34_F3A2_4643_B3AF_816990670555_INCLUDED_
//#if _MSC_VER >1000
#pragma once
//#endif // -MSC_VER >1000
#include <iostream>
#include <fstream>
#include <afxtempl.h>
class CCache
{
public:
void ClearCache();
CCache();
virtual ~CCache();
CList<CNode,CNode&>Data; // object contains actual cache
__int64 iSizeNUM_VC_EXPERIMENTSNUM_EXPERIMENTS; // Array containing cache size levels
__int64 iUsedBytes; // Amount of space currently occupied in the cache
bool bCacheArrayNUM_OBJECTS; // Tracks which objects are being cached
long double fL; // Aging Factor
};
//#endif
//Vache.h: Interface for the CVCache class
#include “Cache.h”
#include “Node.h”
class CVCache
{
public:
void ReduceAges();
void RunPolicy2(int iVCExperimentNumber, int iExperimentNumber, int iCase, CNode node);
void RunPolicy1(int iVCExperimentNumber, int iExperimentNumber, int iCase, CNode node);
void Initialize();
void HandleRequest(int iVCExperimentNumber, int iExperimentNumber, CNode node);
void RunExperiments();
CVCache();
virtual ~CVCache();
CCache Cache1;
CCache Cache2;
double fSize20;
double fVCSizes9;
unsigned long iCache2Hits;
unsigned long iCache2ByteHits;
//unsigned char ClearCache;
};
//Cache.cpp: Implemenation of the CCache class
#include “Cache.h”
#include “LFU_DA_GDSF.h”
#include “stdafx.h”
#include “constants.h”
#ifdef _DEBUG
#undef THIS_FILE
static char THIS_FIL=__FILE__;
#define new DEBUG_NEW
#endif
CCache::CCache()
{
iUsedBytes=0;
}
CCache::~CCache()
{
}
void CCache::ClearCache()
{
//Remove all files from cache
Data.RemoveAll();
iUsedBytes=0;
//Initialize cache array
for (int i = 0; i < NUM_OBJECTS; i++)
bCacheArrayi=false;
//particular to GDSF
fL=0;
}
//Vache.cpp: Implementation of the CVCache class
#include “stdafx.h”
#include “LFU-DA_GDSF.h”
#include “VCache.h”
#include “iostream”
#include “fstream”
#include “string”
#include “afx.h”
#ifdef _DEBUG
#undef THIS_FILE
static char THIS_FIL=__FILE__;
#define new DEBUG_NEW
#endif
using namespace std;
CVCache::CVCache()
{
}
CVCache::~CVCache()
{
}
void CVCache::RunExperiments()
{
int iExpNumber, iVCExpNumber;
CNode node;
//char strpath;
ifstream fileInput;
ofstream fileOutput;
unsigned long iRequest;
char stringSize1024;
char stringBuffer5;
//define the string path
CString strpath;
CString strInFileName= strpath;
strInFileName +=”clean_”;
strInFileName += strCurrentLog;
strInFileName += “.log”;
CString strOutFileName;
for (iVCExpNumber = 4; iVCExpNumber < (NUM_VC_EXPERIMENTS); iVCExpNumber++)
{
CString strOutFileName= strpath;
strOutFileName = strpath;
strOutFileName += strpolicy;
strOutFileName += strCurrentLog;
itoa(iVCExpNumber+1, stringBuffer,10);
strOutFileName += stringBuffer;
strOutFileName += “.results”;
fileOutput.open(strOutFileName);
//print header
fileOutput << “VC Experiment #” << iVCExpNumber+1 << ” VC1=LRU ” << (int) (fVCSizesiVCExpNumber *100);
fileOutput << “% VC2 = GD-Size ” << (fVCSizesiVCExpNumber)*100 << “%
“;
fileOutput << “Cache Size (percentage) Cache Size (Absolute) Hit Rate Byte-Hit Rate “;
fileOutput << “Cache1 Hit Rate Cache1 Byte Hit Rate Cache2 Hit Rate Cache2 Byte Hit Rate
“;
for (iExpNumber=14; iExpNumber< (NUM_EXPERIMENTS); iExpNumber++)
{
Cache1.ClearCache(); //prepare cache for experiment
Cache2.ClearCache();
stats.Initialize();
iRequest=0;
iCache2Hits=0;
iCache2ByteHits=0;
printf(”
Experiment #: %d
“, iExpNumber+1);
fileInput.open(strInFileName);
while (fileInput)
{
//Get current request
}
fileInput >> node.iID;
fileInput >> node.iSize;
if (node.iSize==0)
node.iSize=1;
HandleRequest (iVCExpNumber, iExpNumber, node);
iRequest++;
if ((iRequest % 10000) == 0)
{
printf(“%f%% “, (double) iRequest/stats.iTotalRequestsInLog * 100);
if ((iRequest % 1000000 )== 0)
{
printf(“nContinuing VC Experiment #: %d
“, iVCExpNumber+1);
printf(“nExperiment #: %d
“, iExpNumber+1);
}
}
}
// update stats
stats.fHR= (double) stats.iHits /stats.iTotalRequests;
stats.fBHR= (double) stats.iByteHits /stats.iTotalBytes;
//print current line
_i64toa(Cache1.iSizeiVCExpNumberiExpNumber + Cache2.iSizeiVCExpNumberiExpNumber, stringSize, 10);
fileOutput << fSizeiExpNumber<< ” ” << stringSize<< ” ” << stats.fHR <<” ” << stats.fBHR << ” “;
fileOutput << (float) (stats.iHits / iCache2Hits)/ stats.iHits << ” “<<(float) (stats.iByteHits / iCache2ByteHits) /stats.iByteHits << ” “;
fileOutput << (float) iCache2Hits / stats.iHits << ” ” << (float) iCache2ByteHits / stats.iBytehits << ”
“;
fileOutput.flush();
fileInput.close ();
}
fileOutput.close();
}
void CVCache:: HandleRequest(int iVCExperimentNumber, int iExperimentNumber, CNode node)
{
stat.iTotalRequests++;
stats.iTotalBytes += node.iSize;
// If object is in cache 1
if (Cache1.bCacheArraynode.iID)
{
stats.iHits++; //Record hit
stats.iByteHits += node.iSize;
//Run replacment policy on cache 1 with 1= no new objects introduced in cache
RunPolicy1(iVCExperimentNumber, iExperimentNumber,1, node);
}
//If object is in cache 2
else if ( Cache2.bCacheArraynode.iID)
{
stats.iHits++; //Record hit
stats.iByteHits += node.iSize;
iCache2Hits++;
iCache2ByteHits += node.iSize;
//Run replacemnt policy on cache 2 with 1=no new objects introduced in cache
RunPolicy2(iVCExperimentNumber, iExperimentNumber,1, node);
}
//If object is not in either cache
else
{
//Run replacement policy on cache 1 with 0 = object not in cache
RunPolicy1(iVCExperimentNumber, iExperimentNumber,0, node);
}
}
void CVCache :: Initialize()
{
unsigned long i;
unsigned long j;
//__int64 iUniqueContentBytes;
__int32 iUniqueContentBytes;
// Initialize sizes
fSize0= 0.0006; fSize1=0.0025; fSize2=0.01; fSize3=0.04; fSize4=0.16; fSize5=064; fSize6=1.0;
fVCSizes0=0.1 ; fVCSizes1= 0.2 ; fVCSizes2=0.3 ; fVCSizes3=0.4;
// set up stats filename
CString strStatsFileName =strpath;
//strStatsFileName += “final_stats_”;
strStatsFileName += “arsi_”;
strStatsFileName += strCurrentLog;
strStatsFileName += “.log”;
ifstream fileStats;
fileStats.open(strStatsFileName);
char stringBuffer(256);
//Get total requests
for (int i = 0; i < 4; i++)
fileStats >> stringBuffer;
fileStats >> stats.iTotalRequestsInLog;
// Get unique content bytes
for (int i = 0; i < 6; i++)
fileStats >> stringBuffer;
iUniqueContentBytes = _atoi32(stringBuffer);
fileStats.close();
//Initialize cache arrays
for (int i = 0; i < NUM_OBJECTS; i++)
{
Cache1.bCacheArrayi=false;
Cache2.bCacheArrayi=false;
}
// Initialize cache sizes array
for (i = 0; i < NUM_VC_EXPERIMENTS; i++)
{
for ( i = 0; i < NUM_EXPERIMENTS; j++)
{
Cache1.iSizeij = (__int32) (fSizej * fVCSizesi * iUniqueContentBytes);
Cache2.iSize ij = (__int32) (fSizej * iUniqueContentBytes* Cache1.iSizeij);
}
}
// Initialize statistics
stats.Initialize();
}
void CVCache:: RunPolicy1 (int iVCExperimentNumber, int iExperimentNumber, int iCase, CNode node)
{
POSITION pos;
POSITION pos_before;
CNode nodeInList;
bool bFound =false;
// object is already in the cache
if (iCase==1)
{
// Find objects in cache
pos= Cache1.Data.GetHeadPosition();
do
{
pos_before = pos;
nodeInList =Cache1.Data.GetNext(pos);
if (nodeInList.iID== node.iID)
{
bFound= true;
Cache1.Data.RemoveAt (pos_before); // update cache
if (nodeInList.iSize != node.iSize)
{
Cache1.iUsedBytes -= nodeInList.iSize;
Cache1.iUsedBytes += node.iSize;
}
Cache1.Data.AddHead(node);
}
} while ((!bFound) && (pos != NULL));
}
// object is not in the cache
else
{
Cache1.iUsedBytes += node.iSize; // update number of bytes occupied in cache
Cache1.Data.AddHead (node);
Cache1.bCacheArraynode.iID =true;
// File does not fit in cache
if (Cache1.iUsedBytes > Cache1.iSizeiVCExperimentNumberiExperimentNumber)
{
while ((Cache1.iUsedBytes > Cache1.iSizeiVCExperimentNumberiExperimentNumber) && (Cache1.Data.GetCount() >0))
{ // Remove files until cache size reached
// Remove file from cache 1
nodeInList = Cache1.Data.GetTail();
Cache1.bCacheArraynodeInList.iID =false;
Cache1.iUsedBytes -= nodeInList.iSize;
Cache1.Data.RemoveTail();
// Add removed object to cache2
RunPolicy2 (iVCExperimentNumber, iExperimentNumber, 0, nodeInList);
}
}
}
}
void CVCache:: RunPolicy2 (int iVCExperimentNumber, int iExperimentNumber, int iCase, CNode node)
{
POSITION pos;
POSITION pos_before;
CNode nodeInList;
CNode nodeInList2;
bool bFound =false;
// object is already in the cache
if (iCase==1)
{
// Find object in cache
pos = Cache2. Data.GetHeadPosition();
do
{
pos_before = pos;
nodeInList = Cache2.Data.GetNext(pos);
if (nodeInList.iID == node.iID)
{
bFound= true;
if (nodeInList.iSize != node.iSize)
{
Cache2.iUsedBytes -= nodeInList.iSize;
Cache2.iUsedBytes += node.iSize;
}
nodeInList.fkey= (long double) nodeInList.fc/ nodeInList.iSize + Cache2.fL;
//Remove objects from cache 2
Cache2.Data.RemoveAt(pos_before); // update cache
Cache2.bCacheArraynode.iID=false;
Cache2.iUsedBytes -= node.iSize;
}
} while ((!bFound) && (pos != NULL));
// Add obejct to cache 1
RunPolicy1(iVCExperimentNumber, iExperimentNumber, 0, node);
}
// object is not in the cache
else
{
Cache2.iUsedBytes += node.iSize; // upadte number of bytes occupied in cache
// File does not fit in cache
if (Cache2.iUsedBytes > Cache2.iSizeiVCExperimentNumberiExperimentNumber)
{
while ((Cache2.iUsedBytes > Cache2.iSizeiVCExperimentNumberiExperimentNumber) && (Cache2.Data.GetCount() >0))
{ // Remove files until cache size reached
nodeInList =Cache2.Data.GetTail();
Cache2.bCacheArraynodeInList.iID =false;
Cache2.iUsedBytes -= nodeInList.iSize;
Cache2.Data.RemoveTail();
// update aging factor iL
Cache2.fL = nodeInList.fkey;
//particular to LFU-DA: update key of new object
node.fkey = (long double) node.fc / node.iSize + Cache2.fL;
}
}
// Add file to the cache
Cache2.bCacheArraynode.iID= true;
// Locate tail of the cache
pos= Cache2.Data.GetTailPosition();
// Go backwards in cache and place it before the first obejct with ikey==1
if (Cache2.Data.GetCount()>0)
{
node.fkey = (long double) node.fc /node.iSize + Cache2.fL;
do
{
pos_before =pos;
nodeInList = Cache2.Data.GetPrev(pos);
} while ((node.fkey>= nodeInList.fkey) && (pos !=NULL));
if (node.fkey >=nodeInList.fkey)
Cache2.Data.InsertBefore (pos, node);
else
Cache2.Data.InsertAfter (pos_before, node);
}
else
{
node.fkey= (long double) node.fc / node.iSize + Cache2.fL;
Cache2.Data.AddHead (node);
}
}
// check overflow of fL and fkeys
if (Cache2.Data.GetCount() >0)
{
nodeInList = Cache2.Data.GetHead();
if (nodeInList.fkey > 1.2E+307)
ReduceAges();
}
}
void CVCache::ReduceAges()
{
CNode nodeInlist;
POSITION pos;
POSITION pos_before;
nodeInlist = Cache2.Data.GetTail();
Cache2.fL = nodeInlist.fkey;
pos = Cache2.Data.GetHeadPosition();
do
{
pos_before =pos;
nodeInlist = Cache2.Data.GetNext(pos);
nodeInlist.fkey -= Cache2.fL;
Cache2.Data.RemoveAt (pos_before);
Cache2.Data. InsertBefore(pos, nodeInlist);
} while (pos !=NULL);
}
Common Classes to all simulations

//Node.h: interface for the CNode class
#if !defined (AFX_VCACHE_H_13ACBB34_F3A2_4643_B3AF_816990670555_INCLUDED_)
#define AFX_VCACHE_H_13ACBB34_F3A2_4643_B3AF_816990670555_INCLUDED_
#if _MSC_VER >1000
#pragma once
#endif // -MSC_VER >1000 class CNode
// class that define a node in the cache priority queue
class CNode
{
public:
unsigned long iID; // Document ID
unsigned long iSize; // Document Size
long double fkey; // Sorting key
long double fc; // Cost
CNode();
virtual ~CNode();
private:
};
//class that contains hit statistics
class CStats
{
public:
void CalculateStats(); // Calculate rate statistics
CStats();
virtual ~CStats();
unsigned long iHits; // Number of hits
__int32 iByteHits; // Number of byte hits
unsigned long iTotalRequests; // Total number of requests
__int32 iTotalBytes; // Total number of bytes
unsigned long iTotalRequestsInLog; // Total number of requests in log
long double fHR; // Hit Rate
long double fBHR; // Byte hit rate
private:
};
#endif // !defined (AFX_VCACHE_H_13ACBB34_F3A2_4643_B3AF_816990670555_INCLUDED_)
//Node.cpp: Implementation of the CNode class
#include “Node.h”
#include “stdafx.h”
#include “GD_Size.h”
#ifdef _DEBUG
#undef THIS_FILE
static char THIS_FILE=__FILE__;
#define new DEBUG_NEW
#endif
// Construction/Destruction
CNode::CNode()
{
iID =0;
iSize=1;
fc=1;
}
CNode::~CNode()
{
}
//CStats Class
CStats::CStats()
{
iHits=0;
iByteHits=0;
iTotalRequests=0;
iTotalBytes=0;
fHR=0;
fBHR=0;
}
CStats::~CStats()
{
}
void CStats::CalculateStats()
{
fHR =(long double) iHits/ iTotalRequests;
fBHR=(long double) iByteHits/iTotalBytes;
}