8 June, 2005 at 04:50 Leave a comment

Human Behavior and the Principle of Least Effort, Zipf G.K : An Introduction to Human Ecology. Reading, MA: Addison-Wesley, 1949.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

Have  to get hold of this sometime. This paper is cited in the following contexts:


First 50 documents  Next 50


Proxy-Assisted Techniques for Delivering Continuous Multimedia – Streams Lixin Gao   (Correct)

…. we assume that client requests arrive at a video server according to a Poisson distribution with an average rate (i.e. the average interarrival time between consecutive requests is 1= For a given request, the probability distribution of video selection obeys a Zipf like distribution [10]: for a collection of N video objects, the probability of selecting video object i, i = 1; 2; N , is F i = g i = P N j=1 g j , where g i = 1 . Here denotes the skew factor in video access patterns. In our simulations, we use = 0:271. This value of is known to closely match the ….

G. Zipf, Human Behavior and the Principle of Least Effort, Addison-Wesley, 1949.


Mining Significant Associations in Large Scale Text Corpora – Prabhakar Raghavan Verity   (Correct)

….(as opposed to all terms) is shown [8, 14] to yield clusters that are purer in the concepts they yield. This work was conducted while the author was visiting Verity Inc. Text as a domain: Large scale text corpora are intrinsically different from structured databases. First, it is known [15, 22] that terms in text have skewed distributions. How can we exploit these distributional phenomena Second, as shown by our experiments, co occurrences of terms themselves have interesting distributions; how can one exploit these to mine the associations quickly Third, many statistically ….

G. K. Zipf. Human behavior and the principle of least effort. New York: Hafner, 1949.


Mining HTML Pages to Support Document Sharing in – Cooperative System Donato   (Correct)

….relevant tokens to be used in the bag ofwords representation. These tokens will be called features. As already observed in Luhn s seminal work [10] when distinct words in a textual document are arranged in decreasing order of their frequency of occurrence, the distribution satisfies Zipf s Law [20], that is, the product rank frequency is constant. Luhn conjectured that the relevant words extracted from a document text would peak in the middle range, and further proposed to use words with medium frequency, because high and lowfrequency words are not good content identifiers. These ….

G.K. Zipf (1949). Human Behavior and the Principle of Least Effort. Reading, MA: Addison-Wesley.


Integrating IR and RDBMS Using Cooperative Indexing – Samuel Defazio Amjad (1995)   (20 citations)  (Correct)

….for our experiments was extracted from the News collection [TOMASIC93] In Table 1, we present some of the statistical properties for this corpus. It should be noted that the News collection exhibits typical word usage properties. That is, the word usage patterns can be characterized with a Zipf [ZIPF49] distribution. Table 1: Statistical Properties of the News Collection Total Documents 138,578 Average Document Size 4,950 Bytes Total Unique Words 788,256 Total Word Occurrences 48.526,577 Average Occurrences per Word 61 Frequent Words 39,413 Infrequent Words 748,843 Frequent Word ….

G.K. Zipf. Human Behavior and the Principle of Least Effort, Addison-Wesley Press, 1949. 92


Mining Newsgroups Using Networks Arising From Social.. – Agrawal, Rajagopalan.. (2003)   (1 citation)  (Correct)

….cross from one class to the other; i.e. purity determines the expected number of antagonistic links in the network. The following is a description of the data generation algorithm. 1. For each author v, the number of responses pv that v posts is a random variable drawn from a Zipf distribution [30] with mean I and theta . All 3 real datasets follow a Zipf distribution for the number of postings versus rank of author, as shown in Figure 9 for the Gun Control dataset. 2. Randomly set S fraction of authors as for and the remaining as against . 3. For each author, select the other users ….

G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, 1949.


Modeling Query-Based Access to Text Databases – Agichtein, Ipeirotis, Gravano (2003)   (Correct)

….quantitative analysis of the relative sizes of the different parts of the graph. We conjecture that the reachability graph of a database for tasks such as Tasks 1 and 2 tends to belong to the well studied family of power law graphs. Power law distributions have been known to arise in text domains [12]; additionally, power law graphs have recently been observed to be a good model for graphs in related domains such as the web [3] and the Internet [7] graphs. One property of interest of power law graphs is that the size of their connected components can be estimated using only a small number of ….

G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, 1949.


Efficient Routing in Networks with Long Range Contacts.. – Barrière, al. (2001)   (1 citation)  (Correct)

….nodes u and v, the probability for u to have v as long range contact is given by p r (u; v) d(u;v) w 6=u d(u;w) Gammar , where d( Delta; Delta) is the distance function in the network. The uniform distribution (which is obtained for r = 0) i.e. p(i; j) 1=n, and the Zipf distribution [18] (which is obtained for r = 1 Gamma log :80= log :20) are two examples of harmonic distributions. We have performed an exhaustive study of the performances of greedy routing in the ring augmented with harmonic long range contacts, for all r 0. Table 1 summarizes our results. One important ….

G. K. Zipf, “Human Behavior and the Principle of Least Effort“, Addison Wesley, Cambridge MA, 1949, Reprint New York, Hefner, 1972.


Overlay Caching Scheme for Overlay Networks – Tran, Tavanapong (2002)   (Correct)

….rate of 20 requests per minute. In the simulations, clients do not renege. The requested videos are always watched entirely without any interruption. A simulation run is considered complete when all the requests are completely serviced. The popularity of each video follows a Zipf distribution [15]. A large skew factor means that some videos are highly requested than others. A skew factor of zero means that each video is requested equally often. The default skew factor is set to 0.7, a typical skew factor for video on demand applications [16] 4.2 Simulation results 4.2.1 Effect of ….

G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, Mass, 1949.


Sentence 782 of The New C Standard – Jones (2003)   (Correct)

….cultural backgrounds) Zipf noticed a relationship between the frequency of occurrence of some construct, created by some operation performed by people, and the effort needed to perform them. He proposed an explanation based on the principle of least effort. What has become known as Zipf s law [297] states a relationship between the rank and frequency of occurrence of some construct or behavior. Perhaps its most famous instantiation relates to words, r C fr (where r is 1 for the most frequently occurring word, 2 for the second most frequently occurring and so on; fr is the number of times ….

G. K. Zipf. Human Behavior and the Principle of Least Effort.’ An Introduction to Human Ecology. Addison Wesley, 1949.


A Model for Discovering Customer Value for E-Content – Jagannathan, Nayak.. (2002)   (Correct)

….and this process is repeated forever. Since there are continuous price experiments, any changes in customer behavior will be detected by the algorithm. There are a few problems in the algorithm outlined above. First, customer preferences for products can be expected to follow a Zipf distribution[16]. This means that request arrival rates will be highly disproportionate. Therefore, the interval for observation to estimate the rate of acceptance could be very large. Because the trial price can be suboptimal, the greater the time spent charging the trial price, the greater the loss of revenue. ….

G. Zipf. Human Behavior and the Principle of Least Effort, an Introduction to Human Ecology. Addison-Wesley, 1949.


Characterizing Web Usage Regularities with Information Foraging .. – Liu, Zhang (2004)   (2 citations)  (Correct)

….Recently, researchers have identified several interesting, self organized regularities related to the Web, ranging from the growth and evolution of the Web to the usage patterns in Web surfing. Many regularities are best represented by characteristic distributions following either a Zipf like law [46] or a power law. That is, if probability of a variant taking value is proportional to is from to . A distribution presents a heavy tail if its upper tail declines like a power law [17] What follows lists some of the empirical regularities that have been found on the World Wide Web: ….

G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, Cambridge, MA, 1949.


Channelized Partitioning Problem In Multi-Rate.. -..   (Correct)

….the adaptation granularity on the receiver s side is considerably coarse which causes mismatches between a receiver s expected bandwidth and the actually delivered video bandwidth. In addition, video programs are of different interest. Some hot programs attract much more receivers than others [6]. When considering the overall system utilization, it is clearly inefficient to use the same coding structure for all the sessions. There two possible methods to reduce this bandwidth mismatch. The first is to use a large number of layers or 0 7803 7589 0 02 17.00 2002 IEEE PIMRC 2002 streams ….

G. Zipf, Human Behavior and the Principle of Least Effort, Addison-Wesley Press, 1949.


Quotient Cube: How to Summarize the Semantics of a Data Cube – Lakshmanan, Pei, Han (2002)   (1 citation)  (Correct)

….functions, but the run time and reduction ratio will be quite different depending on the aggregate function used. In addition, for testing the effectiveness of quotient cube w.r.t. AVG, we implemented a version of Algorithm 2. We refer to it as QC AVG below. We used the Zipf distribution [21] for generating synthetic data. It is a standard data set used for testing performance of algorithms under a variety of conditions. In addition, we also used the real dataset containing weather conditions at various weather stations on land for September 1985 [9] This weather dataset has been ….

G.K. Zipf. Human Behavior and The Principle of Least Effort. Addison-Wesley, 1949.


Mining Knowledge-Sharing Sites for Viral Marketing – Richardson, Domingos (2002)   (4 citations)  (Correct)

….individual users . With over 75k users and 500k edges in its web of trust, and 586k reviews over 104k products, Epinions is an ideal source for experiments on social networks and viral marketing. Interestingly, we found that the distribution of trust relationships in the web of trust is Zipfian [25], as has been found in many social networks [24] This is evidence that the web of trust is a representative example of a social network, and thus is a good basis for our study. A Zipfian distribution of trust is also indicative of a skewed distribution of network values, and therefore of the ….

G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, Boston, MA, 1949.


Clustering for Opportunistic Communication – Jay Budzik Shannon (2002)   (1 citation)  (Correct)

….This is only a lower bound on the number of unique pages viewed, because the same URL can contain different content (due to a form submission, for example) These pages were accessed a total of 5039 times. As previous work (e.g. 11] suggests, Web access data follow a Zipf distribution [18]. That is, if the frequency a page with frequency rank i is f, where the frequency rank i is the index of the i th element in the sequence of documents accessed by descending frequency, then the Zipf s Law states f o i , where ] close to 1. The data we gathered follow this distribution, with ] ….

Zipf, G., Human Behavior and the Principle of Least-Effort. Cambridge, MA, USA: Addison-Wesley, 1949.


A Novel Approach to Managing Consistency in Content Distribution.. – Fei (2001)   (12 citations)  (Correct)

….size of an invalidation for each object is 100 Bytes. Assume that there are ten thousand objects at the origin server and the total request rate to these objects is either 0.1, 1 or 10 million times per day. We assume the distribution of requests to different objects follows the Zipf distribution [9, 10]. If we order the objects from 1 to 10,000 according to their popularities, the probability of requesting objects ## ## # ###will be proportional to # # ###. Each request will be made to one replica selected randomly. Next problem is to determine the inter update time (the inverse of the ….

G. Zipf, ed., Human Behavior and the Principle of Least Effort. Reading, MA: Addison-Wesley, 1949.


Probabilistic Information Retrieval Model for Dependency.. – Lee, Lee   (Correct)

….effective in practical situations. Losee also proposed that Expected Mutual Information Measure (EMIM) is superior to Inverse Document Frequency (IDF) for the weighting function [11] In fact, the two measures are actually similar to each other on the theoretical ground based on Luhn [4] and Zipf [20] model, but it is meaningful to have some empirical experiment results. Van Rijsbergen explored one way of removing the independence assumption [1] He constructed a probabilistic model incorporating dependences between index terms. The extent to which two index terms depend on one another is ….

Zipf, G.K. Human Behavior and the Principle of Least Effort. Addision-Wesley, Reading, Mass. 1949.


Progress Towards Recognizing and Classifying Beautiful Music – With Computers.. (2002)   Self-citation (Zipf)   (Correct)

No context found.

Zipf, G.K. (1949), Human Behavior and the Principle of Least Effort, AddisonWesley.


An Empirical Analysis of C Preprocessor Use – Michael Ernst Greg (2002)   (4 citations)  (Correct)

No context found.

G.K. Zipf, Human Behavior and the Principle of Least Effort. Cambridge, Mass.: Addison-Wesley, 1949.


Generating Representative Web Workloads for Network and.. – Paul Barford And (1997)   (55 citations)  (Correct)

No context found.

G. K. Zipf. Human Behavior and the Principle of Least-Effort. Addison-Wesley,Cambridge, MA, 1949. 18


Analyzing Client Interactivity in Streaming Media – Italo (2004)   (Correct)

No context found.

G. K. Zipf. Human Behavior and the Principle of Least-Effort. Addison-Wesley, Cambridge, MA, 1949. 543


Structured Databases on the Web: – Observations And Implications (2004)   (Correct)

No context found.

G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, Cambridge, Massachusetts, 1949.


Memory-Limited Execution of Windowed Stream Joins – Utkarsh Srivastava Jennifer (2004)   (Correct)

No context found.

G. E. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley Press, Inc., 1949. 335


Selection, Tinkering, and Emergence in Complex.. – Sole.. (2003)   (Correct)

No context found.

Zipf, G.K. Human Behavior and the Principle of Least Effort. An Introduction to Human Ecology. New York: Hafner reprint,


Using the Web as a Measure of Familiarity and Obscurity – David Shamma Sara   (Correct)

No context found.

G. Zipf. Human Behavior and the Principle of Leasteffort. Addison-Wesley, Cambridge, MA, USA, 1949.


Supporting Cooperative Caching in Ad Hoc – Networks Liangzhong Yin (2004)   (Correct)

No context found.

G. Zipf, “Human Behavior and the Principle of Least Effort,” AddisonWe s l ey , 1949.


Distributed and Parallel Databases, 15, 219–236, 2004 c – Parallel Rolap Data   (Correct)

No context found.

G. Zipf, Human Behavior and The Principle of Least Effort, Addison-Wesley, 1949.


Placement Problems for Transparent Data Replication Proxy Services – Xu, Li, Lee (2002)   (1 citation)  (Correct)

No context found.

G. K. Zipf, Human Behavior and the Principle of Least Effort. Reading, MA: Addison-Wesley, 1949.


A Generalized Target-Driven Cache Replacement Policy for.. – Yin, Cao, Cai (2003)   (Correct)

No context found.

G. Zipf, “Human Behavior and the Principle of Least Effort,” Addison-Wesley, 1949.


An empirical study on Principal – Component Analysis For   (Correct)

No context found.

Zipf, G. K. (1949) Human behavior and the principle of least effort. Addison-Wesley.


Range CUBE: Efficient Cube Computation by Exploiting Data.. – Ying Feng Divyakant (2003)   (Correct)

No context found.

G.K. Zipf. Human Behavior and The Principle of Least Effort. AddisonWesley, 1949. 12


Agent-Based Characterization of Web – Regularities Jiming Liu   (Correct)

No context found.

27 Zipf, G. K. (1949): Human Behavior and the Principle of Least Effort. Addison-Wesley, Cambridge, MA.


Journal of Visual Communication and Image Representation 10.. – Article Id Jvci   (Correct)

No context found.

G. Zipf, Human Behavior and the Principle of Least Effort, Addison-Wesley, Reading, MA,


Data Indexing in Peer-to-Peer DHT Networks – Garc Es-Erice Felber   (Correct)

No context found.

G.K. Zipf, Human Behavior and the principle of least effort, AddisonWesley Press, Cambridge (MA), 1949.


A Novel Caching Scheme for Internet based Mobile Ad Hoc.. – Sunho Lim Wang-Chien   (Correct)

No context found.

G. K. Zipf, Human Behavior and the Principle of Least Effort. AddisonWesley, Cambridge, MA, 1949.


A Case for a Generalized Periodic Broadcast Server that.. – Broadcast Revenue Design   (Correct)

No context found.

G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, Mass, 1949.


The Intelligent Surfer: – Probabilistic Combination Of   (Correct)

No context found.

G. K. Zipf (1949). Human Behavior and the Principle of Least Effort. Addison-Wesley, Cambridge, MA.


Measuring and Modelling the Group Membership in the – Internet Jun-Hong Cui (2003)   (Correct)

No context found.

G. K. Zipf. Human behavior and the principle of least effort. Reading, MA: Addison-Wesley, 1949.


Espoo 2003 HUT-TCS-A77 – Nonuniform   (Correct)

No context found.

G. K. Zipf. Human behavior and the principle of least effort. AddisonWesley Press, Cambridge, MA, USA, 1949.


Measuring and Modelling the Group Membership in the – Internet Jun-Hong Cui (2003)   (Correct)

No context found.

G. K. Zipf. Human behavior and the principle of least effort. Reading, MA: Addison-Wesley, 1949.


Hierarchical Routing with Soft-State Replicas in TerraDir – Bujor Silaghi Vijay   (Correct)

No context found.

G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, Cambridge, MA, 1949.


The Web as a graph – Ravi Kumar Prabhakar   (24 citations)  (Correct)

No context found.

G. K. Zipf. Human behavior and the principle of least effort. New York: Hafner, 1949.


An Empirical Analysis of C Preprocessor Use – Ernst, Badros, Notkin (2002)   (4 citations)  (Correct)

No context found.

G.K. Zipf, Human Behavior and the Principle of Least Effort. Cambridge, Mass.: Addison-Wesley, 1949.


Supporting Cooperative Caching in Ad Hoc – Networks Liangzhong Yin   (Correct)

No context found.

G. Zipf, “Human Behavior and the Principle of Least Effort,” AddisonWe s l ey , 1949.


Predicting Change Propagation in Software Systems – Ahmed Hassan And (2004)   (Correct)

No context found.

G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, 1949.


Data Indexing in Peer-to-Peer DHT Networks – Garc Es-Erice Felber   (Correct)

No context found.

G. K. Zipf. Human Behavior and the principle of least effort. Addison-Wesley Press, Cambridge (MA), 1949.


The Intelligent Surfer: Probabilistic Combination of Link.. – Richardson, Domingos (2002)   (30 citations)  (Correct)

No context found.

G. K. Zipf (1949). Human Behavior and the Principle of Least Effort. Addison-Wesley, Cambridge, MA.


E-Content Pricing: Analysis and Simulation – Srinivasan Jagannathan Jayanth   (Correct)

No context found.

G. Zipf, Human Behavior and the Principle of Least Effort, an Introduction to Human Ecology. Addison-Wesley, 1949.


Likelihood-Based Inference for Stochastic Models of Sexual.. – Handcock   (Correct)

No context found.

Zipf, G. (1949) Human behavior and the principle of least effort (Addison-Wesley, New York).


Load Sharing in Distributed VoD (Video on Demand) Systems – González.. (2002)   (Correct)

No context found.

G. Zipf, Human Behavior and the Principle of Least Effort, Addison-Wesley, Cambridge, /V[assachussets, 1949.

Documents 51 to 100  Previous 50  Next 50


In Search of Invariants for E-Business Workloads – Menascé, Almeida, Fonseca (2000)   (2 citations)  (Correct)

….Term Rank Figure 12: Popularity of search terms. the fact that the search function is used by robots, which behave differently from human users. For instance, the spike observed in day 3 results from an unexpected number of requests for the home page. 6. 1 Popularity of Search Terms Zipf s law[17] was originally applied to the relationship be tween a word s popularity in terms of rank and its frequency of use. It states that if one ranks the popularity (denoted by p) of words used in a given text by their frequency of use (denoted by P) then P 1 p. Figure 12 shows that Zipf s law ….

G. Zipf, Human Behavior and the Principle of Least Effort, Addison-Wesley, Cambridge, MA, 1949.


My cache or yours? Making storage more exclusive – Wong, Wilkes (2002)   (Correct)

….from inclusive to exclusive caching would reduce the mean latency from 0#5#T a #T # to T a , i.e. from 5.1 ms to 0.2 ms. 2.2 Zipf workloads Even workloads that achieve high client hit rates may benefit from exclusive caching. An example of such a workload is one with a Zipf like distribution [49], which approximates many common access patterns: a few blocks are frequently accessed, others much less often. This is formalized as setting the probability of a READ for the i th block proportional to 1#i a ,wherea is a scaling constant commonly set to 1. Consider the cumulative hit rate ….

….the array should replace, and we trust the client to be well behaved. Recent studies of cooperative World Wide Web caching protocols [1, 20, 47] look at policies beyond LRU and MRU. Previously, analyses of web request traces [2, 5, 8] showed the file popularity distributions to be Zipf like [49]. It is possible that schemes tuned for these workloads will perform as well for the sequential or random access patterns found in file system workloads, but a comprehensive evaluation of them is outside the scope of this paper. In addition, web caching, with its potentially millions of clients, ….

G. K. Zipf. Human Behavior and Principle of Least Effort. Addison-Wesley Press, Cambridge, MA, 1949. 175


Supporting Online Resource Discovery in the Context of Ongoing . . .. – Buzdik   (Correct)

….This is only a lower bound on the number of unique pages viewed, because the same URL can contain different content (due to a form submission, for example) These pages were accessed a total of 5039 times. As previous work (e.g. 20] suggests, Web access data follow a Zipf distribution [36]. That is, if the frequency a page with frequency rank i is 6, where the frequency rank i is the index of the i th element in the sequence of documents accessed by descending frequency, then the Zipf s Law states f o i where fl close to 1. The data we gathered follow this distribution, with ….

Zipf, G., Human Behavior and the Principle of Least-Effort. Cambridge, MA, USA: Addison-Wesley, 1949. FIGURE LEGENDS


TCP-SMO: Extending TCP to Support Medium-Scale Multicast.. – Liang, Cheriton (2002)   (1 citation)  (Correct)

….hundreds of participants or receivers and rarely thousands. It is logistically difficult to get larger numbers of receivers all ready to receive at the same time except for the few events with truly mass appeal. Some researchers [5] have suggested that the sizes of multicast channels follow a Zipf [6] distribution, meaning that most multicast channels are small except for a few extremely popular ones, and such distribution may not change significantly in the future. Moreover, current practice is to distribute highdemand content hierarchically, from a central source to web cache servers that ….

G.K. Zipf, Human Behavior and the Principle of Least-Effort, AddisonWesley, Cambridge, MA, 1949.


Self-Organized Criticality and Mass Extinction in Evolutionary.. – Krink, Ren (2001)   (1 citation)  (Correct)

….with the functional relationship of a power law: x x . When plotted on a log log scale, power law data can be described by a linear fit with a negative slope. Various complex systems follow these power laws such as light emitted from a quasar [13] Zipf s law of population sizes in cities [17], or the Gutenberg Richter law of earthquakes [7] The mechanism that causes self organized criticality is based on local interactions between many components in an open system. Most state transitions between the components only affect their neighborhood, but once in a while entire avalanches of ….

ZIPF, G. K. Human Behavior and the Principle of Least Effort. Addison-Wesley, Cambridge, MA, 1949.


Optimization Techniques under Uncertain Criteria, and.. – Kreinovich, Aló (2002)   (Correct)

….well, we should choose t most frequent ones. A skill of a student can be thus characterized by the number t of the types of items in which this student is well skilled. To estimate frequencies of different types, we can use a general (semi empirical) law discovered by G. K. Zipf (see, e.g. [7, 12]) according to which, if we order types from the most frequent to the least frequent one, then the frequency f i of i th type is proportional to 1=i: f i = c=i for some constant c. The value of this constant can be determined from the fact that the sum of all these frequencies should be equal to ….

G. K. Zipf, Human behavior and the principle of least-effort, Addison-Wesley, Cambridge, MA, 1949. 12


Overcoming Limitations of Sampling for Aggregation Queries – Chaudhuri, Das, Datar (1999)   (7 citations)  (Correct)

….from a uniform distribution. Since we were interested in comparing the alternatives across different data distributions, we modified the TPC R data generation program to generate data with varying degree of skew. The modified program generates data for each column in the schema from a Zipfian [20] distribution determined by the Zipfian parameter z 5 . For 5 We have made this program (which runs on x86 Windows NT platform) available for public download from [6] our experiments we generated 100MB TPC R databases by varying z over values 1, 1.5, 2, 2.5, and 3. The ratio of the maximum ….

Zipf G. E. Human Behavior and the Principle of Least Effort. Addison-Wesley Press, Inc, 1949.


Model-Based Clustering and Data Transformations.. – Yeung, Fraley.. (2001)   (10 citations)  (Correct)

….for gene i, which is chosen according to a normal distribution with mean 3 and standard deviation 0.5. i; j) models the cyclic behavior. Each cycle is assumed to span 8 time points (experiments) k is the class number, and the sizes of the different classes are generated according to Zipf s Law [Zipf, 1949]. Different classes are represented by different phase shifts w k , which are chosen according to the uniform distribution in the interval [0; 2 ] The random variable , which represents the noise of gene synchronization, is generated according to the standard normal distribution. The parameter ….

Zipf, G. K. (1949) Human behavior and the principle of least effort. Addison-Wesley.


Hybrid Multimedia-on-Demand Systems – Eric Wong Vicki   (Correct)

….probability against relative service charge with different Zipf s parameters using the optional scheme. We assume that the system can support N = 1000 concurrent multimedia streams and can store C = 100 multimedia contents. Let the popularity of the multimedia contents follow Zipf s distribution [16] and is given by a i = A i (1 s) where A is a normalization constant and the parameter s determines the shape of the distribution. Also, we let C i = 30 and t i = 90 minutes for 1 i 100, T B = 10 minutes and l = 10 requests minute. Figures 5 and 6 show the revenue and blocking probability ….

G.K. Zipf. Human Behavior and the Principle of Least Effort, Addison-Wesley, 1994.


The Trickle-Down Effect: Web Caching and Server Request.. – Doyle, Chase, Gadde.. (2001)   (15 citations)  (Correct)

….them. We use tracedriven simulation and synthetic traffic patterns to illustrate the trickle down effect and to investigate its implications for downstream components of an end to end content delivery architecture. We focus on Web requests that follow a Zipf like object popularity distribution [18], which many studies have shown to closely model Web request traffic [11, 8, 5, 17] We then illustrate the importance of this effect by examining its impact on load distribution strategies and content cache effectiveness for downstream servers. The contribution of this paper is to demonstrate and ….

G. Zipf. Human Behavior and the Principle of Least Effort. Addison Wesley, 1949.


Modeling, Measurement And Performance Of World Wide Web.. – Barford (2001)   (Correct)

….individual files. The distribution of popularity has a strong effect on the behavior of caches (e.g. buffer caches in the file system) since popular files will typically tend to remain in caches. 22 Popularity distribution for files on Web servers has been shown to commonly follow Zipf s Law [132, 8, 29]. Zipf s Law states that if files are ordered from most popular to least popular, then the number of references to a file (P ) tends to be inversely proportional to its rank (r) That is: P = kr 1 for some positive constant k. This property is surprisingly ubiquitous and empirical measurements ….

G. Zipf. Human Behavior and the Principle of Least-Effort. Addison-Wesley, Cambridge, MA,


Does AS Size Determine Degree in AS Topology? – Tangmunarunkit, Doyle..   (Correct)

….the dialog to a larger class of explanations for the variability of the AS topology degree distribution. 2 An Alternative Explanation To motivate our explanation, we first note that high variability is the norm in the distribution of sizes of many real world entities. Cities by population size [12, 17], companies by size of income [10] or by size of assets [11] are all known to exhibit power law tails. The distribution of countries or oil reserves by size appears to exhibit a Weibullian distribution [8] In the computing literature, file [15] and Web document [5] sizes have been known to have ….

ZIPF, G. K. Human Behavior and the Principle of Least Effort. Addison-Wesley, 1949.


The Intelligent Surfer: Probabilistic Combination of Link.. – Richardson, Domingos (2002)   (30 citations)  (Correct)

….Educrawl is a crawl of the web, restricted to .edu domains. The crawler was seeded with the first 18 results of a 1 Google has this feature as well. See http: http://www.google.com technology whyuse.html. 2 This is because the distribution of words in text tends to follow an inverse power law [11]. We also verified experimentally that the same holds true for the distribution of the number of documents a word is found in. 3 It is common to remove stop words such as the, is, etc. as they don t affect the search search for University on Google (www.google.com) Links containing or ….

G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, Cambridge, MA, 1949.


Execution Performance Issues in Full-Text Information Retrieval – Brown (1995)   (8 citations)  (Correct)

….list size distributions requirements, we need to consider further the size and access characteristics of the data we need to manage. 3.2.1 Inverted List Characteristics The size of an inverted list depends on the number of occurrences of the corresponding term in the document collection. Zipf [94] observed that if the terms in a document collection are ranked by decreasing number of occurrences (i.e. starting with the term that occurs most frequently) there is a constant for the collection that is approximately equal to the product of any given term s frequency and rank order number. The ….

….However, query evaluation is document driven and requires that the inverted lists be sorted by document identifier. Instead, if n is defined to be relatively small, we can 109 maintain a separate list of the documents associated with the top n tf weights for each long inverted list. Zipf s Law [94] suggests that there will be relatively few long inverted lists, but they will consume the majority of the space in the inverted file. If each top document list is constrained to be smaller than a disk page, then the overhead associated with the top document lists will be a small percentage of the ….

Zipf, G. K. Human Behavior and the Principle of Least Effort. Addison-Wesley Press, 1949.


Global Optimization of Histograms – Jagadish, Jin, Ooi, Tan (2001)   (10 citations)  (Correct)

….due to a collection of histograms should be computed the same as that for a histogram: E abs group = 1 L L X i=1 ff abs i E rel group = 1 L L X i=1 ff rel i 4. 1 A Comparative Study on Effectiveness The sources of data sets for our experiments included synthesized zipf data [20], TPC D data [1] and UCI KDD Database Repository [2] The results of our experiments did not vary significantly for different data sets or different range query sets. To illustrate the benefits of GOH and accuracy of the algorithms proposed, we present test results on two groups of data sets. ….

G. Zipf. Human Behavior and the Principle of Least Effort. Addison Wesley, 1949.


Using Machine Learning To Improve Information Access – Sahami (1999)   (14 citations)  (Correct)

…. say, once or twice (or generally infrequently) in the collection will have little resolving power between documents [166] The justification for the elimination of such infrequent terms lies in an observation about the frequency of word appearances in corpora made by Zipf over 50 years ago [177]. Since that time, this observation has been named Zipf s Law, although it is not actually a law, but merely an empirical and approximate mathematical phenomenon. To describe Zipf s Law more formally, let us denote the total frequency of a term t in a corpus D by i t . That is, i t = P d2D ….

Zipf, G. K. Human Behavior and the Principle of Least Effort. AddisonWesley, 1949.


Evaluating the Performance of Distributed Architectures for.. – Cahoon, McKinley (1997)   (7 citations)  (Correct)

…. 0 5 10 15 20 25 30 35 40 45 16 23 30 37 44 51 58 65 72 79 86 93 100 of Occurrences Terms Per Query TIPSTER 1 Queries Figure 5: TIPSTER 1 Query Lengths Distribution of Terms in Queries (QTF) Zipf documented the widely accepted distribution of term frequencies in text collections [Zipf, 1949]. In contrast, researchers do not agree on a commonly accepted distribution for term frequencies in queries [Wolfram, 1992] Figure 6 shows the distribution of our query sets. The query term frequency distributions for the query sets are similar but the distributions do not closely match a well ….

Zipf, G. K. (1949). Human Behavior and the Principle of Least Effort. Addison-Wesley Press. 41


Model-Based Clustering and Data Transformations.. – Yeung, Fraley..   (10 citations)  (Correct)

….according to a normal distribution with mean 3 and standard deviation 0.5. The quantity (i; j) models the cyclic behavior. Each cycle is assumed to span 8 time points (experiments) The class number is denoted by k, and the sizes of the different classes are generated according to Zipf s Law (Zipf 1949). Different classes are represented by different phase shifts w k , which are chosen according to the uniform distribution in the interval [0; 2 ] The random variable , which represents the noise of gene synchronization, is generated according to the standard normal distribution. The parameter ….

Zipf, G. K. (1949). Human behavior and the principle of least effort. Addison-Wesley.


A Universal Rule for the Distribution of Sizes – Salingaros, West   (Correct)

….Auerbach s Law, see Lotka (1956) if p is the relative number of genera in a species and x is the rank order of those genera then 1 2, see Willis (1922) or Lotka (1956) if p is the relative frequency of the usage of a word in a language and x is the rank order of that word then 1 1. 5, see Zipf (1949); if p is the relative frequency in the number of authors of a given number of papers published in a year x then = 2, see Lotka (1926) if p is the relative number of purines in a DNA sequence and x is the difference in the number of purines and pyramidines then 1.25, see Allegrini et al. ….

Zipf, G. K., 1949 Human Behavior and the Principle of Least Effort (Cambridge, Massachusetts: 21 Addison-Wesley).


Model-Based Clustering and Data Transformations.. – Yeung, Fraley.. (2001)   (10 citations)  (Correct)

….to a normal distribution with mean 3 and standard deviation 0.5. The quantity models the cyclic behavior. Each cycle is assumed to span 8 time points (experiments) The class number is denoted by , and the sizes of the different classes are generated according to Zipf s Law (Zipf 1949). Different classes are represented by different phase shifts ) which are chosen according to the uniform distribution in the interval 2 L M . The random variable , which represents the noise of gene synchronization, is generated according to the standard normal distribution. The ….

Zipf, G. K. (1949). Human behavior and the principle of least effort. Addison-Wesley.


Self-Similarity in the Web – Dill, Kumar, McCurley, Rajagopalan.. (2001)   (18 citations)  (Correct)

….Distributions with an inverse polynomial tail have been observed in a number of contexts. The earliest observations are due to Pareto [38] in the context of economic models. Subsequently, these statistical behaviors have been observed in the context of literary vocabulary [45] sociological models [46], and even oligonucleotide sequences [33] among others. Our focus is on the closely related power law distributions, defined on the positive integers, with the probability of the value i being proportional to i Gammak for a small positive number k. Perhaps the first rigorous effort to define ….

G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, 1949.


Bi-Directional Optimality Theory: An Application of Game Theory – Dekker, van Rooy (2000)   (Correct)

No context found.

Zipf, George Kingsley (1949), Human behavior and the principle of least effort, Cambridge: Addison-Wesley. 28


Stochastic Models for the Web Graph – Kumar, Raghavan, Sivakumar   (45 citations)  (Correct)

No context found.

G. K. Zipf. Human behavior and the principle of least effort. New York: Hafner, 1949.

Documents 101 to 150  Previous 50  Next 50


Emergence and Levels of Abstraction – Damper   (Correct)

….enormously fruitful in science. Accordingly, the relation between the two is explored before concluding. 2 Emergence: Basic Ideas To introduce the topic of emergence, let us consider (to cite Schroeder 1991, p. 35) one of the most surprising instances of a power law in the humanities , namely Zipf s (1949) law according to which the frequency of occurrence, f , of words is (approximately) inversely proportional to their rank 1 , r , for many natural languages. That is: f # 1 r and log f = log r const. Hence, a plot of f against r on double logarithmic axes yields a straight line ….

Zipf, G. K. (1949). Human Behavior and the Principle of Least Effort. Cambridge, MA: Addison-Wesley.


Target-Text Mediated Interactive Machine Translation – Foster, Isabelle, Plamondon (1997)   (6 citations)  (Correct)

….the current prefix to be completed, and pick a single best candidate using the evaluation function. In this section we describe several design features which are essential to performing this operation in real time. 3.2.1. Active and Passive Vocabularies A well established corollary to Zipf s law (Zipf, 1949) holds that a minority of words account for a majority of tokens in text. To capitalize on this, our system s French vocabulary is divided into two parts: a small active component whose contents are always searched for a match to the current prefix, and a much larger passive part which comes into ….

G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, Reading, MA, 1949.


A Text Retrieval Package for the Unix Operating System – Quin (1994)   (Correct)

….that a few words account for almost all of the data, and almost all words occur fewer than ten times. The frequency f of the nth most frequent word is usually given by Zipf s Law: f = k (n m) s . 1] where k, m and s are nearly constant for a given collection of documents [Zipf49], Mand53] As a result, the optimisation whereby lq text packs the first half dozen or so matches into the end of the fixed size record for that word, filling the space reserved for storing long words, is a significant saving. On the other hand, the delta encoding gives spectacular savings for ….

Zipf, Georke K., Human Behavior and the Principle of Least Effort, Addison-Wesley, Cambridge,


Packing Schemes for Gang Scheduling – Feitelson (1996)   (42 citations)  (Correct)

….that included user and job information) are shown in Fig. 6. It is seen that some sequences are extremely long (the maximum observed is 402 runs on the ANL SP1) The fact that the slope is a straight line in these log log plots indicates a generalized Zipf distribution (i.e. p(n) 1=n ) [30, 26]. Using linear regression, the harmonic order ( in the equation for the probability distribution) is around 2:2 for both cases, after deleting outliers that appear only a small number of times. Similar results were obtained for the Cornell trace [15] 3.4 Job Classes In many cases jobs in a ….

G. K. Zipf, Human Behavior and the Principle of Least Effort. Addison-Wesley, 1949.


MetaBank: A Knowledge-Base of Metaphoric Language Conventions – Martin (1991)   (4 citations)  (Correct)

….of the clusters. This plot shows that a small number of metaphors account for a large number of instances, with the curve tailing off significantly to extremely low frequency metaphors. It appears, therefore, that the conventional metaphors in this corpus closely follow the Zipf s Law (Zipf, 1949) behavior observed in so many domains. This Zipf like behavior has potential implications for both computational and psychological processing models of metaphor understanding. The pattern of memory access for conventional metaphors or for source analogs is far from uniform. 23 0 20 40 60 80 ….

Zipf, G. (1949). Human Behavior and the Principle of Least Effort. Addison–Wesley, Cambridge, MA.


Concept Decompositions for Large Sparse Text Data using.. – Dhillon, Modha (2000)   (30 citations)  (Correct)

….Further evidence of self similarity is provided by our concept vector plots (Figures 7 and 12) that demonstrate that word counts within and outside of each cluster have the same general distribution. It is well known that word count distributions in text collections obey a certain Zipf s law (Zipf, 1949). Our results suggest the possibility that Zipf s law may hold in a recursive fashion, that is, for each cluster within a collection, and for each sub cluster within a cluster, and so on. One of our main findings is that concept decompositions that are derived from concept vectors can be used for ….

Zipf, G. K.: 1949, Human Behavior and the Principle of Least Effort. Addison Wesley, Reading, MA.


The Scaling of Fluvial Landscapes – Birnir, Smith, Merchant   (Correct)

….in a variety of physical contexts. Important goals of this literature are to characterize the qualitative behavior of high and infinite dimensional, nonlinear systems that are driven by noise and to explain the origin of temporal and spatial scaling behavior in a wide variety of phenomena [26, 13, 18, 7]. It is now known that the evolution of surfaces whose dynamics are driven by various forms of noise are frequently characterized by scaling laws. Such laws often, but not always, indicate that the system is insensitive to differences in the details of the underlying mechanisms and processes. This ….

Zipf G.K. Human Behavior and the Principle of Least Effort. Addison-Wesley, Cambridge, MA, 1949. 49


Global Memory Management For Multi-Server Database Systems – Venkatarman (1996)   (2 citations)  (Correct)

….times were well within 2 percent of the mean in all cases. 5.3.1 Workload To simulate workloads experienced by web servers, we closely study the logs maintained at several web sites. We find that the access frequencies to files at these web sites to closely resemble a Zipfian distribution [Zip49] If the number of accesses to the files T, the number of files is F, and the skew is z. The frequency for accessing a file i generated 112 by a zipfian distribution is given by the following formula: t i = T 1=i z P F i=1 1=i z for all 1 i F We demonstrate this with the logs ….

G. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, 1949.


Characterizing Web Workloads to Improve Performance – Wolman (1999)   (2 citations)  (Correct)

….is a function that ranks the access frequency of documents. George Zipf, a professor of linguistics at Harvard, published a book in 1949 that showed a large number of examples from social and economic data where the relationship between rank and frequency followed a particular distribution[Zipf 49] In particular, he observed that the frequency of some event, as a function of the rank i of that event, is proportional to 1=i , where is a constant close to one. This became known as Zipf s law, and probability distributions that follow this formula are often referred to as Zipf like, ….

G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, 1949. 27


Incremental Document Clustering for Web Page Classification – Wong, Fu (2000)   (4 citations)  (Correct)

….be useful in Web Domain. 3.2 Document Feature Extraction Luhn [13] proposed that the frequency of word occurrences in an article furnishes a useful measurement of word significance. Let us order the words by their frequency of occurrences, resulting in the rank order. According to Zipf s Law [25] (see also [5] the product of the frequency of use of words and the rank order is approximately constant. Traditionally, a document D is represented by a feature vector of the form (d 1 ; d n ) where d i is the numeric weight for the i th feature and n is the total number of features. ….

H. P. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, Cambridge, Massachusetts, 1949. 20


The Golden Estimator: Efficient Range Query Estimation – Wu, Agrawal, Abbadi (2000)   (Correct)

….to test the relative benefits of the different sampling approaches identified. The setting of this experiment is as follows. The domain of the attribute values is from 0 to 2 16 Gamma 1. The number of tuples in the relation is 10 5 . We employ the Zipf distribution in this experiment [Zip49, Poo97] with z = 0:1. Figure 4 shows an example fdf and cdf of one such attribute in the given relation. The vertical lines on the x axis correspond to the frequencies associated with each attribute value and the dashed line is the resulting cumulative frequency distribution. Figure 5. a) shows ….

….data in the real world, we use the Zipf distribution for most of the experiments (except in Section 3.3) To ensure that there is no correlation between attribute values and their frequencies, we randomly assign the Zipf frequencies to the attribute values. We use the algorithm described in [Zip49] for generating the Zipf distribution. In the experiments we explore the performance of the different methods on query sets with different selectivities. The selectivity sel of query q(a X b) is defined as sel = b Gamma a) N , which is the percentage of the query range to the entire ….

G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, Reading, MA, 1949.


Class-Oriented Page Invalidation for Caching Dynamic Web Content – Zhu, Yang   (Correct)

….7: Comparison of server CPU utilization when accessed by 60 clients. a)CPU utilization for the caching only case. b) CPU utilization for selective precomputing. c)Sorted CPU utilization for both cases. ordered from hot to cold (i.e. receiving least bids) also follow Zipf distribution [31]. Figure 8(a) shows a typical distribution on June 1. 1999, which follows Zipf distribution. The X axis is the item ranks in a logarithmic scale sorted in a decreasing order of the number of bids received. The Y axis is the number of bits received in a logarithmic scale. 0 0.5 1 1.5 2 2.5 3 3.5 4 ….

G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, Cambridge, MA, 1949. 17


Class-based Cache Management for Dynamic Web Content – Huican Zhu And (2000)   (17 citations)  (Correct)

…. we have collected the numbers of daily bids on featured items at eBay from May 28, 1999 to June 8, 1999 and have observed that for items whose bidding period will end in a day, the numbers of bids on them ordered from hot to cold (i.e. receiving least bids) also follow Zipf distribution [33]. Figure 9(a) shows a typical distribution on June 1. 1999, which follows Zipf distribution with ff = 0:3. The X axis shows item ranks in logarithmic scale based on number of bids received. The Y axis shows the number of bids received in logarithmic scale. 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 3.4 ….

G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, Cambridge, MA, 1949. 22


Some Aspects of Optimality in Natural Language Interpretation – Blutner (1999)   (14 citations)  (Correct)

….to integrate optimal interpretation and optimal production. A look on the area of pragmatics seems to be useful since an analogous optimality metric plays an indispensable role there. The Gricean conversational maxims are widely recognized as a (rather informal) expression of this metric. With Zipf (1949) as a forerunner we have to acknowledge two basic and competing forces, one force of unification, or Speaker s economy, and the antithetical force of diversification, or Auditor s economy. The two opposing economies are in extreme conflict, and we have to look for an optimal way to resolve this ….

Zipf, G.K. (1949), Human Behavior and the Principle of Least Effort, Addison-Wesley, Cambridge.


Efficient Filtering of XML Documents for Selective.. – Altinel, Franklin (2000)   (43 citations)  (Correct)

….a wildcard operator. F and S are used to control the presence and characteristics of filters in the queries. F determines which level of a query (if any) will contain a filter, and S specifies the selectivity of such a filter if it does exist. Finally is the parameter of the zipf distribution [Zip49] that is used to determine the skewedness of the choice of element names at each level in query generation. When it is 0, each element name in the query is selected randomly from the set of element names allowed at its level with a uniform distribution, whereas at a setting of 1, the choice is ….

G. K. Zipf, Human Behavior and Principle of Least Effort, Addison-Wesley, Cambridge, Massachusetts, 1949.


The Web as a graph – Kumar, Raghavan, Rajagopalan.. (2000)   (24 citations)  (Correct)

….HITS method [23] the enumeration of certain bipartite cliques [25] and classification algorithms utilizing hyperlinks [14] In Section 3 we summarize a number of measurements on large portions of the Web graph. We show that the inand out degrees of nodes follow inverse polynomial distributions [15, 20, 29, 33], and we study the distributions of more complex structures. We present measurements about connected component sizes, and from this data we draw conclusions about the high level structure of the web. Finally, we present some data regarding the diameter of the web. In Section 4 we show that the ….

G. K. Zipf. Human behavior and the principle of least effort. New York: Hafner, 1949.


Supporting Full-Text Information Retrieval with a.. – Brown, Callan, Croft.. (1994)   (17 citations)  (Correct)

….Cumulative Inverted List Record Size (bytes) of Records of File Size Figure 1: Cumulative distribution of inverted list sizes for the Legal collection, in terms of both total number of records and total file size. occurrences of the associated term in the document collection. Zipf [22] observed that if the terms in a document collection are ranked by decreasing number of occurrences (i.e. starting with the term that occurs most frequently) there is a constant for the collection that is approximately equal to the product of any given term s size and rank order number. The ….

G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley Press, 1949.


Resource Allocation for stor-serv : Network Storage Services with .. – Chuang (1999)   (4 citations)  (Correct)

….Q, where all objects q k Q are of identical size. The collection has a uniform spatial demand distribution, i.e. gv i ( 1 V vi V . 18) However, the collection has a non uniform demand distribution across its objects. Specifically, the object access pattern obeys Zipf s distribution [17]: 6 gq k ( C2 k (19) where C 2 is a constant such that condition (4) is satisfied. Figure 4 shows, for a multi object collection, the efficiency gains of a mapping solution based on partial replication over one based on full replication. The full replication solution is constrained in two ….

Zipf, G.K. Human behavior and the principle of least effort. Cambridge MA: Addison-Wesley, 1949.


The Kendra Cache – Mccann Howlett Crane   (Correct)

…. improve performance if caching is limited to single client or single sites, whereas hierarchically ordered caches extending beyond the level of user or site have [3, 6] Patterns of access to Web resources have been observed to follow Zipf s Law, which in its original context of natural language [14], states that one can rank the popularity ae of a word in a given text in terms of its frequency of use P . P 1 ae Plotting this distribution using logarithmic scales results in a straight line, as shown in figure 1. 4 In terms of web access this simply means that a small number of popular ….

G. K. Zipf. Human Behavior and the Principle of Least-Effort. Addison-Wesley,


Distributed Adaptive Multimedia Delivery in Kendra – McCann, Howlett, Crane   (Correct)

….through caching does not significantly improve performance if caching is limited to single client or single sites [4, 7] whereas a hierarchically ordered caches extending beyond the level of user or site have. Prefetch Patterns of access to Web resources have been observed to follow Zipf s Law [17]: a small number of popular resources will be accessed with by far the greatest frequency [1] suggesting that resource retrieval is strongly predictable. This motivates pro active movement of resources within the cache hierarchy to reflect demand, both observed and predicted. 5.1 Policies To ….

G. K. Zipf. Human Behavior and the Principle of LeastEffort. Addison-Wesley, Cambridge, MA, 1949.


DASD Dancing: A Disk Load Balancing Optimization Scheme for .. – Wolf, Yu, Shachnai (1995)   (26 citations)  (Correct)

….2 to compute our load balancing goals. As noted, this problem can be solved trivially. Next we describe the structure of the simulation experiments themselves. We choose hour of day arrival patterns according to a Zipf like distribution with N = 24 and = 3. Briefly, a Zipf like distribution [21, 10] takes two parameters, N and , the latter corresponding to the degree of skew. The distribution is given by p i = c=i 1 Gamma for each i 2 f1; Ng, where c = 1= P N i=1 1=i 1 Gamma ] is a normalization constant. Setting = 0 corresponds to a pure Zipf distribution, which is highly ….

G. Zipf, Human Behavior and the Principle of Least Effort, Addison-Wesley, 1949.


Parallel Query Processing – Yu, Chen, Wolf, Turek (1993)   (6 citations)  (Correct)

No context found.

G. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, MA, 1949. 46


Efficient Generation of Descriptions in Context – Emiel Krahmer Mari (1999)   (Correct)

No context found.

Zipf, G.K. (1949), Human Behavior and the Principle of Least Effort, AddisonWesley, Cambridge, MA.


Semantics and Complexity of Question Answering.. – Bagga, Zadrozny.. (1999)   (Correct)

No context found.

G.K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley Press, Cambridge,MA, 1949.


Integration of Continuous Speech Recognition and Information.. – Siegler (1999)   (1 citation)  (Correct)

No context found.

H. Zipf, Human behavior and the principle of least effort, Addison-Wesley, Cambridge, Mass., 1949. CSR and IR integration

Documents 151 to 199  Previous 50


The CANDID Video-on-Demand Server – Soloviev, Rousskov (1995)   (Correct)

….of 4 one hour movies per disk. Movie length is chosen to be one hour to limit the simulation time. 1 Thus, there are 64 movies total stored in a 16 disk server configuration with each movie declustered across all disks. Movies are selected randomly using a Zipfian distribution with z = 1:0 ( Zipf49] The system supports a fixed number of terminals in an experiment, called #clients. Movie starts are randomly distributed over a one hour time interval in the beginning of experiments. When a movie is finished the terminal running the movie is deleted. At this event, a new request is issued for ….

Zipf, G., Human Behavior and the Principle of Least Effort, Addison-Wesley, Reading, MA, 1949.


A One-Pass Space-Efficient Algorithm for Finding Quantiles – Agrawal, Swami (1995)   (6 citations)  (Correct)

….a version of this heuristic for n = 1. 3 Empirical Evaluation We conducted several experiments to empirically assess the behavior of our algorithm. We first show the results of experiments for X generated according to two distributions: the uniform distribution and the Zipf distribution [12]. For the Zipf distribution, we choose the Zipf parameter to be 0.86, which corresponds to the 80 20 distribution. We also experimented with other distributions by choosing different values for the Zipf parameter and found similar results. The number of values in X (kXk) was one of f1 million, 2 ….

G. K. Zipf, Human Behavior and the Principle of Least Effort, Addison-Wesley, Reading, MA, 1949.


Transient Dynamics and Scaling Phenomena in Urban Growth – Susanna Manrubia   (Correct)

….and challeging similarities. In fact, despite the clear differences in their small scale details, urban settlements show well defined generic features that strongly suggest the existence of common universal laws of city growth and morphological organization. One of the best known is Zipf s law [2] which states that the fraction of cities f(n) with n inhabitants shows a definite power law dependence f(n) n Gammar with r 2. In its original form, this law represents the population P (R) of cities as a function of the rank R of the city: R = 1 is assigned to the largest city, R = 2 to ….

….the largest city, R = 2 to the second largest, and so on. Remarkably enough, the observed dependence P (R) R Gamma1 does not depend on cultural, social or historical factors or on short and long term economic or political plans. In his book Human behavior and the principle of least effort [2], Zipf reports about communities of 2500 or more inhabitants for the USA in the period 1790 1930. The net population grew extremely fast in that period, but the profile of the function P (R) was always maintained. Also the distribution f(a) of the fraction of cities with area a seemingly presents ….

[Article contains additional citation context not shown here]

G.K. Zipf, Human Behavior and the Principle of Least Effort (Addison-Wesley, Cambridge MA, 1949).


Deriving Part of Speech Probabilities from a Machine-Readable.. – Coughlin (1996)   (1 citation)  (Correct)

….few languages. Other sources of part of speech probabilities would prove useful for languages without these valuable resources. The remainder of this paper explores the use of MRDs as a potential source of part of speech probabilities. 2 Dictionary as a Source of Part of Speech Probabilities Zipf [16] found that on average, the more frequently a word occurs, the more senses it had (in the dictionary) Extending Zipf s law a bit further, we postulate that for polysemous words, the more frequent a part of speech, the more senses that part of speech will have. The word follow, for example, has ….

Zipf, G.K., 1949, Human Behavior and the Principle of Least Effort. Cambridge: Addison-Wesley Press, Inc.


Supporting Full-Text Information Retrieval with a Persistent.. – Brown (1994)   (17 citations)  (Correct)

….given the most careful consideration. Efficient lookup requires knowing the size distribution of the records in the file and a characterization of the record access patterns. The size of an inverted list depends on the number of occurrences of the associated term in the document collection. Zipf [23] observed that if the terms in a document collection are ranked by decreasing number of occurrences (i.e. starting with the term that occurs most frequently) there is a constant for the collection that is approximately equal to the product of any given term s frequency and rank order number. The ….

G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley Press, 1949.


Index Structures for Information Filtering Under the Vector .. – Yan, Garcia-Molina (1993)   (9 citations)  (Correct)

….thus obtained is a term. Next we measured the occurrency frequency of each term in the database, obtaining the plot shown in Figure 4 (note the log log scale) The straight line in the graph was derived by curve fitting using [16] We can see the database does demonstrate Zipfian characteristics [19]. The x intercept (i.e. size of the term vocabulary, which we denote by v) is found to be 521,915. Also, the average number of words per document (denoted by d) is found to be 323. Hence, we adopt the following probabilistic document model, which is similar to the one in [15] The terms in a ….

ZIPF, G.K. Human Behavior and the Principle of Least Effort, Addison-Wesley Press, Cambridge, Massachusetts, 1949.


Semantic Lexicon Acquisition for Learning Natural Language.. – Thompson (1998)   (6 citations)  (Correct)

….in the geography database domain: none of the hand built meanings for phrases in that lexicon had functors embedded in arguments. A grammar was used to generate utterances and their meanings from each original lexicon, with terminal categories selected using a distribution based on Zipf s Law (Zipf, 1949). Under Zipf s Law, the occurrence frequency of a word is inversely proportional to its ranking by occurrence. We started with a baseline corpus generated from a lexicon of 100 words using 25 conceptual symbols and no ambiguity or synonymy; 1949 sentence meaning pairs were generated. We split this ….

Zipf, G. (1949). Human behavior and the principle of least effort. Addison-Wesley, New York, NY.


A Bayesian Approach to Filtering Junk E-Mail – Sahami, Dumais, Heckerman.. (1998)   (51 citations)  (Correct)

….reduction helps provide an explicit control on the model variance resulting from estimating many parameters. Moreover, feature selection also helps to attenuate the degree to which the independence assumption is violated by the Naive Bayesian classifier. We first employ a Zipf s Law based analysis (Zipf 1949) of the corpus of E mail messages to eliminate words that appear fewer than three times as having little resolving power between messages. Next, we compute the mutual information MI(X i ; C) between each feature X i and the class C (Cover Thomas 1991) given by MI(X i ; C) X X i =x i ;C=c ….

Zipf, G. K. 1949. Human Behavior and the Principle of Least Effort. Addison-Wesley.


An Information-theoretic Solution to Parameter Setting – Eric Brill (1993)   (Correct)

….In the worst case, our method could require on the order of n 2 additional storage, where n is the size of the vocabulary, beyond that already employed to store the vocabulary. This is the storage required for keeping statistics on all possible word pair cooccurrences. However, due to Zipf s law (Zipf 1949), the actual storage requirements will be much less. In Brill (1993b) experiments show an empirical upper bound for the storage requirements at about 3 n, and since the only word pairs considered are those where one of the words is in the list of 20 verbs, this number will be much smaller. 3.5 ….

Zipf, G. Human behavior and the principle of least effort. New York: Hafner Pub. Co. 1949.


An Experimental Comparison of Several Clustering and.. – Meila, Heckerman (1998)   (22 citations)  (Correct)

….We shall use X i to refer both to a particular story and its corresponding variable. A preliminary clustering analysis of this dataset, using both EM and CEM with random initialization, showed the following. 1) There were approximately 10 clusters. 2) The size of clusters followed Zipf s law (Zipf, 1949). That is, the probabilities P (class = k) k = 1; K, when sorted in descending order, showed a power law decay. 3) The marginal probabilities of story hits also followed Zipf s law. That is, the probabilities P (X i =hit) for all stories, when sorted in descending order, showed a ….

Zipf, G. (1949). Human behavior and the principle of least effort. Addison– Wesley.


Disk Load Balancing for Video-on-Demand Systems – Wolf, Yu, Shachnai (1997)   (14 citations)  (Correct)

….goals. As noted, this problem can be solved trivially, because the disks are homogeneous. Next we describe the structure of the simulation experiments themselves. We choose hour of day arrival patterns according to a Zipf like distribution with N = 24 and = 3. Briefly, a Zipf like distribution [24, 13] takes two parameters, N and , the latter corresponding to the degree of skew. The distribution is given by p i = c=i 1 Gamma for each i 2 f1; Ng, where c = 1= P N i=1 1=i 1 Gamma ] is a normalization constant. Setting = 0 corresponds to a pure Zipf distribution, which is highly ….

G. Zipf, Human Behavior and the Principle of Least Effort, Addison-Wesley, 1949.


Performance Analysis of Distributed Information Retrieval.. – Cahoon, McKinley (1995)   (2 citations)  (Correct)

…. Cumulative Term Frequency Tipster 1 Queries 103rd Congressional Record Queries CACM Queries Figure 6: Query Term Frequency Distributions Query Term Frequency Distribution Zipf documented the widely accepted distribution of term frequencies in text collections based upon empirical measurements [Zip49] In contrast, the distribution of term frequencies in queries is more difficult to characterize and researchers do not agree on a commonly accepted distribution [Wol92] Figure 6 shows the query term frequency distributions for our query sets. The shapes of the distributions for the three query ….

G.K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley Press, 1949.


Usage Patterns of a Web-Based Image Collection – Talagala, Asami, Patterson   (Correct)

….are very rarely accessed, if ever. Although the site has offered 20,000 to 59,000 images over the five month period, only 5451 unique images were actually retrieved over the five month period. Prior studies of web traffic have found that web document popularity follows Zipfs Law [12] Zipfs Law [13,14], originally applied to the relationship between a word s popularity rank and its frequency of use, states the following: the frequency of occurrence of some event (P) as a function of the rank (i) when the rank is determined by the above frequency of occurrence, is a power law function P i .

G.K. Zipf, Human Behavior and the Principle of Least Effort Addison-Wesley, 1949


N-Gram-Based Text Categorization – Cavnar, Trenkle (1994)   (4 citations)  (Correct)

….their similarity that is resistant to a wide variety of textual errors. 3.0 Text Categorization Using NGram Frequency Statistics Human languages invariably have some words which occur more frequently than others. One of the most common ways of expressing this idea has become known as Zipf s Law [6], which we can re state as follows: The nth most common word in a human language text occurs with a frequency inversely proportional to n. The implication of this law is that there is always a set of words which dominates most of the other words of the language in terms of frequency of use. This ….

Zipf, George K., Human Behavior and the Principle of Least Effort, an Introduction to Human Ecology, Addison-Wesley, Reading, Mass., 1949.


Multicast Group Behavior in the Internet’s Multicast Backbone .. – Almeroth, Ammar (1997)   (23 citations)  (Correct)

….some members are part of the group for many days. For membership duration data, we use a Zipf distribution which works when a large percentage of (duration) samples are concentrated at the beginning of the range while the remaining percentage are widely dispersed over the remainder of the spectrum[10]. This is exactly the type of behavior exhibited. Observed Data Inter Arrival (mins) Feb 7th STS 63 Inter Arrival Data 8am 10am 12pm 2pm 4pm Observed Data Duration Feb 7th STS 63 Duration Data 8am 10am 12pm 2pm 4pm 1s 1m 1h Sample Exponential Data Inter Arrival (mins) Sample Zipf Data ….

G. Zipf, Human Behavior and the Principle of Least Effort. Reading, MA: Addison-Wesley, 1949.


Maintaining Library Catalogues with an RDBMS: A.. – Balownew, Bode..   (Correct)

….NESTED LOOPS SORT (Unique) Fig. 6. Part of) Evaluation plan for strategies I and RI kind of query modification is of any influence on the query results. This theme will be discussed in more detail in the next section. 7 A Note on Stop Words An observation that is commonly attributed to Zipf [23] states that a significant amount of the tokens in a document are made up by a small number of words (e.g. articles like THE) In many text retrieval systems this skewed keyword distribution is to support architectural decisions like the following (taken from Fox [9] many of the most .

G.K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, 1949.


When Do Finite Sample Effects Significantly Affect Entropy.. – de Wit   (Correct)

….affect entropy estimates theorem, the entropy is related to the typical words in the limit where N 1; the contribution of rare words progressively disappears as N increases. In some sense this observation justifies the procedure to be described below. It was noted by Pareto [7] Zipf [8] and others, and later interpreted by Mandelbrot [9] that the tail of the Zipf ordered distribution n k tends to follow a universal scaling law n k = ffk Gammafl ; fl 0 : 4) Such a scaling does not imply any particular self organization and hence its physical meaning should not be .

G. Zipf, Human behavior and the principle of least effort (Addison-Wesley, Cambridge MA, 1949).


Computing Iceberg Queries Efficiently – Min Fang (1998)   (40 citations)  (Correct)

….through array A, and setting BITMAP 1 [i] if bucket i is heavy, i.e. if A[i] T . We compute 2 The 80 20 rule is an instance of high skew. When the rule applies, a very small fraction of targets account for 80 of tuples in R, while the other targets together account for the other 20 [Zip49] BITMAP 1 since it is much smaller than A, and maintains all the information required in the next phase. After BITMAP 1 is computed, we reclaim the memory allocated to A. We then compute F by performing a candidate selection scan of R, where we scan R, and for each target v whose BITMAP 1 [h 1 ….

G.K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley Press, Cambridge, Massachusetts, 1949.


Balancing Histogram Optimality and Practicality for Query.. – Ioannidis, Poosala (1995)   (61 citations)  (Correct)

….set of R j and may contain duplicates. Example 2. 1 A common claim is that, in many attributes in real databases, there are few domain values with high frequencies and many with low frequencies [3, 6] Hence, for most examples in this paper, frequency distributions follow the Zipf distribution [24], which has exactly the above property. For a relation size T and domain size M , the frequencies generated by the Zipf distribution are t i = T 1=i z P M i=1 1=i z for all 1 i M : 1) Figure 1 is a graphical representation of (1) for T = 1000, M=100, and z = 0; 0:02; 0:1, where the ….

….sets) We use five types of histograms: trivial, optimal serial, optimal end biased, equi width and equi depth histograms, with the number of buckets fi ranging from 1 to 30. In most experiments, the frequency sets of all relations follow the Zipf distribution, since it presumably reflects reality [3, 6, 24]. Its z parameter takes values in the range [0:01 Gamma 3:0] which allows experimentation with varying degrees of skewness. The size M of the join domains ranges from 10 to 100. 5.1 Effect of Histograms Type In this subsection, we compare the error generated by the five types of histograms ….

G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, Reading, MA, 1949.


Style-Shifting in Oral Interlanguage: Quantification and Definition – Dewaele (1995)   (Correct)

….style shifting (5) 5. Conclusion At this stage an answer can be put forward about the motivations of speakers who style shift. We think the answer lies primarily in a decision about optimal communication. Informal implicit speech is much more economical than its explicit, formal variant. In Zipf s (1949) terms, speakers respect the principle of least effort, and we can add, as long as it does not violate the maxim of quantity that every cooperative speaker is supposed to respect in a particular situation (Grice, 1975) This means that a speaker will, if possible, avoid a formal style which is .

Zipf, G.K. 1949. Human behavior and the principle of the least effort, Cambridge, Mass: Addison-Wesley.


Scalable Delivery of Web Pages Using Cyclic Best-Effort (UDP).. – Almeroth (1998)   (29 citations)  (Correct)

….large number of potential clients that make requests according to a Poisson process and that all requested pages are of the same size 3 . Each server will have a number of pages L that it can serve. We assume that the probability that a request is for a particular page follows a Zipf distribution[19]. That is if we label the pages in decreasing order of popularity, the probability that a particular request is for page i is given by =i where is the normalization constant [ P L i 1=i] Gamma1 . We use the time to transmit a page in chunks (i.e. the time for transmitting one cycle) as our ….

G. Zipf, Human Behavior and the Principle of Least Effort. Reading, MA: Addison-Wesley, 1949.


Heavy-Tailed Probability Distributions in the World Wide Web – Crovella, Taqqu, Bestavros (1998)   (60 citations)  (Correct)

….0:9 ff 1:1. Thus our results indicate that with respect to the upper tail distribution of file sizes, Web traffic does not differ significantly from the more general case of FTP traffic. 3. 5 Zipf s Law Another instance of power law distributions in our data occurs as an instance of Zipf s law [Zip49, discussed in [Man83] Zipf s law was originally applied to the relationship between the number of references made to a word in a given text, and its order in a ranking based on the same measurement. It states that if one ranks the words used in a given text by their popularity P (frequency of ….

G. K. Zipf. Human Behavior and the Principle of Least-Effort. Addison-Wesley, Cambridge, MA, 1949. Department of Computer Science and Department of Mathematics Boston University Boston, MA 02215 Email: crovella@cs.bu.edu, murad@math.bu.edu, best@cs.bu.edu


Storage Estimation for Multidimensional Aggregates .. – Shukla.. (1996)   (49 citations)  (Correct)

….1 has a one level hierarchy. Dimension 0 has 1000 distinct values, and its hierarchies have 200 and 50 values respectively, while dimension 1 has 10,000 distinct values, and its hierarchy has 500 values. The database is a combination of distinct values of all dimensions. A Zipfian distribution [Zipf49] was used to generate the database from the distinct values of each dimension. A Zipf value of 0 means that the data is uniformly distributed. By increasing Zipf, we increase the skew in the distribution of distinct values in the database. The mapping from the distinct values in a dimension to its ….

G.K. Zipf. Human Behavior and the Principle of Least Effort, Addison-Wesley, Reading, MA, 1949.


Determining the Optimal File Size on Tertiary Storage Systems.. – Luis Bernardo (1998)   (1 citation)  (Correct)

….then as a check of the above formula. Another point worth mentioning about Figure 5 is the fact that the larger the file splitting overhead o, the more important it becomes in determining the optimal file size. 4.2. A Zipf like query size distribution As a second example we consider the Zipf [10] like distribution D(q; Q; ff) A 1 ffQ q ; 4) 0 100 200 300 400 500 600 700 800 0 200 400 600 800 1000 1200 Number of equal size files N = 10 N = 30 N = 50 Figure 6. Query execution time as a function of n for three different delta query sizes, L=N , N = 10; 30; 50. Here the file ….

G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison Wesley Press, Inc., Cambridge, 1949.


Dynamic Load Balancing in Geographically Distributed.. – Colajanni, Yu.. (1998)   (12 citations)  (Correct)

…. that in average 75 of the client requests come from only 10 of the domains [2] For this reason, we assume that clients are partitioned among the domains based on a Zipf s distribution that is, a distribution where the probability of selecting the i th domain is proportional to 1=i (1 Gammax) [8]. One important consideration in dealing with the problem of nonuniform distribution of requests and limited control of the DNS is the kind of information that is needed by the scheduler. For homogeneous servers in [4] it was found that the following actions improved performance: estimating the ….

….to be exponentially distributed [2] The parameters to be considered in our simulations fall under five categories. They are presented in Table 1 with their default values shown between parentheses. We assume that clients are partitioned among the K domains on a pure Zipf s distribution basis [8]. We use the maximum difference between the relative server capacities to denote the four levels of server heterogeneity considered in the study. Assuming N = 7, the relative server capacities f 0 i sg are given in Table 2 for the four cases. To allow a fair comparison among the performance of ….

G.K. Zipf, Human Behavior and the Principles of Least Effort, Addison-Wesley, Reading, MA, 1949.


Collecting and Modeling the Join/Leave Behavior of Multicast.. – Almeroth, Ammar (1996)   (12 citations)  (Correct)

….does not work because several people join for very long periods. Instead, we use a Zipf distribution which works when a large percentage of (duration) samples are concentrated at the beginning of the range while the remaining percentage are widely dispersed over the remainder of the spectrum[17]. This is exactly the type of behavior exhibited, so a Zipf distribution fits very well. For short sessions, because the maximum membership duration is much shorter and there are no long durations, an exponential function can be used. However, there is a still a small tail. 4.3 Spatial Analysis We ….

G. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, Reading, MA, 1949.


Towards Intelligent Virtual Environment for Training.. – Alo, Aló.. (1999)   (Correct)

….doctor can be thus characterized by the number t of the situations in which this doctor is well skilled. Some situations types are more frequent, some are less frequent. To estimate frequencies of different situations, we can use a general (semi empirical) law discovered by G. K. Zipf (see, e.g. [5, 7]) according to which, if we order types from the most frequent to the least frequent one, then the frequency f i of i th type is proportional to 1=i: f i = c=i for some constant c. The value of this constant can be determined from the fact that the sum of all these frequencies should be equal to ….

G. K. Zipf, Human behavior and the principle of least-effort, Addison-Wesley, Cambridge, MA, 1949.


SCAM: A Copy Detection Mechanism for Digital Documents – Shivakumar, Garcia-Molina (1995)   (14 citations)  (Correct)

….unit is small (as in words) However, we see one advantage for small chunking units. A small chunking unit increases locality. That is most documents will have a relatively small working set of words rather than sentences. Consider the frequency distribution of N words to follow Zipf s Law [26, 23, 11]. If the words are ranked in non increasing order of frequencies, then the probability that a word w of rank r occurs is P (w) 1 r P N v=1 1=v If we assume a vocabulary of about 1.8 million words [23] about 40,000 (about 2 of 1.8 million) words constitute nearly 75 of the actual ….

G.K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley Press, Cambridge, Massachusetts, 1949.


Optimizations for Dynamic Inverted Index Maintenance – Cutting (1990)   (34 citations)  (Correct)

….rightmost path into core. This requires log b (N=w) disk transactions. Hence w(log b (N=w) w(log b N Gamma log b w) 3) disk accesses are required on average to index n postings with this technique. If n is large the frequency distribution of unique words will approximately follow Zipf s law. [8] In other words f(w)r(w) z (4) for some constant z, where f(w) is the frequency of word w in the set of n instances and r(w) is the rank of f(w) among the frequencies of all words in that set of postings. Note that in this approximation z is both the vocabulary size and the frequency of the ….

G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, 1949.


The Art of Massive Storage: A Case Study of a Web Archive – Talagala, Asami.. (1999)   (Correct)

….on the list from month to month. The percentage of image views that are due to these images decreases over time. This is most likely because the size of the on line collection has tripled during this time. We found that popular images are very popular. Image popularity follows a Zipf distribution [5]; the number of hits to a document varies inversely with its popularity rank. In this respect, an image archive is much like a library; there are a few very popular books and a huge collection of books that are very rarely accessed. In general, zooming in is the primary means of navigation. Nearly ….

G.K. Zipf, Human Behavior and the Principle of Least Effort Addison-Wesley, 1949


Generating Representative Web Workloads for Network and.. – Barford, Crovella (1998)   (254 citations)  (Correct)

….among individual files. The distribution of popularity has a strong effect on the behavior of caches (e.g. buffer caches in the file system) since popular files will typically tend to remain in caches. Popularity distribution for files on Web servers has been shown to commonly follow Zipf s Law [22, 1, 7]. Zipf s Law states that if files are ordered from most popular to least popular, then the number of references to a file (P ) tends to be inversely proportional to its rank (r) That is: P = kr Gamma1 for some positive constant k. This property is surprisingly ubiquitous and empirical ….

G. K. Zipf. Human Behavior and the Principle of LeastEffort. Addison-Wesley, Cambridge, MA, 1949.


Long Term Resource Allocation in Video Delivery Systems – Almeroth, Dan, al. (1997)   (3 citations)  (Correct)

….special case with load surges at hour intervals. This type of workload is based on the belief that broadcast programs that start and end at the start end of each hour will still exist and impact on demand behavior. 2. Movie Selection: For each request, a movie is selected using a Zipf distribution[18] which states that the probability that movie i is chosen equals c (i 1 ) where c is a normalizing constant. Empirical evidence presented in [17] suggests that = 0:271. The Zipf distribution roughly translates to an above average number of requests for popular or hot movies and less ….

G. Zipf, Human Behavior and the Principle of Least Effort. Reading, MA: Addison-Wesley, 1949.


A Unified Model of Lexical Acquisition and Lexical Access – Brent (1996)   (Correct)

….is less than one. The strength of this preference increases as the frequency of the familiar word decreases. Thus, words that are both short and rare may not be segmented out. This trade off is consistent with the observation that, in natural languages, short words are almost always common (Zipf, 1949). Overlapping familiar words. What does the DR strategy imply for overlapping familiar words One interesting prediction arises when the choice between the overlapping words affects the number of words in the segmentation. For example, consider the segmentation of an imaginary utterance abcde, ….

Zipf, G. (1949). Human Behavior and the Principle of Least Effort. New York, NY: AddisonWesley.


Semantic Feature Extraction From Technical Texts With Limited.. – Agarwal (1995)   (3 citations)  (Correct)

….veterinary medicine domain. Existing taxonomies like WordNet are very good at classifying such unambiguous words into precise classes. Another problem that is typically encountered with distributional techniques is that they do not work very well for low frequency words. We know from Zipf s Law (Zipf 68 1949) that there are usually a large number of words that occur in a domain with low frequency 5 and one cannot afford not to find semantic classes for most of them. Taxonomic techniques like WordNet are not affected by the frequency of the word and are hence very useful in improving the coverage of ….

Zipf, G. 1949. Human behavior and the principle of least effort. N.Y.: Hafner.


Tagging an Unfamiliar Text With Minimal Human Supervision – Brill, Marcus (1992)   (8 citations)  (Correct)

….60 80 100 120 0.84 0.86 0.88 0.90 0.92 0.94 number of rules 0 20 40 60 80 100 120 0.84 0.86 0.88 0.90 0.92 0.94 Figure 2: Improving Accuracy made by the statistical techniques. Learning correcting rules can be an effective approach when the distribution of errors somewhat follows Zipf s Law [ Zipf 1949 ] If Zipf s Law is obeyed, then a small number of high probability error types will account for a large percentage of total error tokens. Such a distribution is amenable to the rule based correction approach: the fact that there is a small number of high probability error types ensures that ….

Zipf, G. 1949. Human Behavior and the Principle of Least Effort. New York: Hafner Pub. Co.


Changes in Web Client Access Patterns -.. – Barford.. (1999)   (99 citations)  (Correct)

…. et al. 1998; Crovella and Bestavros 1997; Barford and Crovella 1998] A good recent review of progress in characterizing Web workloads is given in [Pitkow 1997] In characterizing the relative number of requests made to different Web documents, previous work has often referred to Zipf s law [Zipf 1949, discussed in [Mandelbrot 1983] Zipf s law was originally applied to the relationship between a word s popularity in terms of rank and its frequency of use. It states that if one ranks the popularity of words used in a given text (denoted by ae) by their frequency of use (denoted by P ) then ….

Zipf, G. K. (1949), Human Behavior and the Principle of Least-Effort , Addison-Wesley, Cambridge, MA.


Information, Adaptive Contracting, and Distributional.. – De Vany, Walls (1995)   (Correct)

…. model of preemption in the cereal market could produce this market fractioning and a geometric share distribution, or the looser log series distribution (see the next paragraph) 12 This form of the Pareto law, which uses rank instead of the number of firms above a certain size, is known as Zipf s (1949) law. IV. Empirical Box Office Revenue Distributions The data Our data are the box office revenues of Variety s Top 50 motion pictures by week. The Variety sample is a computerized weekly report of domestic film box office performance in major and medium metropolitan market areas. These data ….

Zipf, G. K. (1949). Human Behavior and the Principle of Least Effort.


Implementations of Partial Document Ranking Using Inverted Files – Wong, Lee (1993)   (14 citations)  (Correct)

….inverted file is shown below: term t i df tf 1;max tf 2;max Delta Delta Delta Gamma d j ; tf x d k ; tf y Delta Delta Delta The processing and storage overheads of the SW will be higher than the first two methods. However, since most postings lists are short according to the Zipf s law (Zipf, 1949) and therefore will occupy only a small number of disk pages, the overheads in keeping the tf values and maintaining the order of the disk pages are insignificant. This is especially true for environments where updates are done in batch and are infrequent compared to retrieval. Disk pages are ….

G. K. Zipf (1949). Human Behavior and the Principle of Least Effort. Reading, MA, Addison Wesley Publishing.


Zipf’s Law – Poosala   Self-citation (Zipf)   (Correct)

….This report deals with another such empirical phenomenon which has been observed in fields as diverse as population distribution, word usage and biological genera and species. G. K. Zipf first proposed a law (named Zipf s law) which he observed to be approximately obeyed in many of these domains [Zipf 49] This ubiquitous empirical regularity suggests the presence of a universal principle. This report mainly concentrates on various formulations of the law and describes a few attempts at statistically explaining its theoretical underpinnings. In particular, the work relating to frequency of usage ….

….2 (3) where K 0 = K=n. Equation (3) is the size frequency relation corresponding to (1) G. K. Zipf attempted to explain the origins of the law in the nature of human behavior, through the principle of least effort. The above formulation has the following deficiencies: 1. Zipf s explanation [Zipf 49] in terms of human behaviour does not explain the underlying statistical process. 2. The value of the constant K 0 in (3) depends on the number of objects n. 3. As discussed below, a statistical explanation for the phenomena observed by Zipf leads to a family of distributions and (3) is just a ….

G. K. Zipf, “Human behavior and the principle of least effort“, 1949, AddisonWesley, Reading MA.


Changes in Web Client Access Patterns -.. – Barford.. (1998)   (99 citations)  (Correct)

No context found.

G. K. Zipf. Human Behavior and the Principle of Least-Effort. Addison-Wesley, Cambridge, MA, 1949.


Server-initiated Document Dissemination for the WWW – Bestavros, Cunha (1996)   (56 citations)  (Correct)

No context found.

G. K. Zipf. Human Behavior and the Principle of Least-Effort. Addison-Wesley, Cambridge, MA, 1949. 5 For example, the increasing percentage of Common Gate Interface (CGI) queries, Java applets and scripts.


Of Wealth Power and Law: the Origin of Scaling in Economics – Levy, Solomon   (Correct)

No context found.

Zipf, G. K. Human Behavior and the Principle of Least Effort (Addison Wesley, Cambridge MA, 1949).


Performance of Inverted Indices in Shared-Nothing.. – Tomasic, Garcia-Molina (1993)   (22 citations)  (Correct)

No context found.

G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley Press, 1949.


Characterizing Reference Locality in the WWW – Almeida, Bestavros, Crovella.. (1996)   (140 citations)  (Correct)

No context found.

G. K. Zipf. Human Behavior and the Principle of Least-Effort. Addison-Wesley, Cambridge, MA, 1949.


Characteristics of WWW Client-based Traces – Cunha, Bestavros, Crovella (1995)   (161 citations)  (Correct)

No context found.

G. K. Zipf. Human Behavior and the Principle of Least-Effort. Addison-Wesley, Cambridge, MA, 1949.


Characterizing Reference Locality in the WWW – Almeida, Bestavros, Crovella, al. (1996)   (140 citations)  (Correct)

No context found.

G. K. Zipf. Human Behavior and the Principle of LeastEffort. Addison-Wesley, Cambridge, MA, 1949.


Selective Placement and Replication Strategies for Storing.. – Shahabi, Khan   (Correct)

No context found.

G. K. Zipf., Human Behavior and the Principle of Least Effort, Addison-Wesley, Reading MA, 1949.


The Effect of Correlated Faults on Software Reliability – Wu, Malaiya (1993)   (Correct)

No context found.

G. K. Zipf, Human Behavior and the Principle of Least Effort, Addison-Wesley, 1949.

Striping for Interactive Video: Is it Worth it? – Reisslein, Ross, al. (1998)   (Correct)

….analysis we assume that the user demand for videos varies from video to video. Specifically, if there are M videos with video 1 being the most popular and video M being the ]east popular, then the probability that the mth most popular video is requested by a user is given by the Zipf distribution [11]: q, K m c, m = 1, M, where 1 K= 1 1 2I . 1 Mi The Zipf distribution corresponds to a highly locMized user request pattern that has been typical at movie rental stores. Note that the Zipf distribution depends on a parameter 0. Increasing increases the relative popularity of the ….

G. K. Zipf. Human Behavior and Principle of Least Effort: An Introduction to Human Ecology. Addison-Wesley, Cambridge, MA, 1949. 16


Interactive Video Streaming with Proxy Servers – Reisslein, Hartanto, Ross (1999)   (11 citations)  (Correct)

….show that replication and striping of popular objects in the proxy significantly improve the hit rate and throughput of the proxy as well as the user perceived media quality. Throughout our performance study we assume that the requests for continuous media objects follow the Zipf dis tribution [40]. Specifically, if there are M objects, with object i being the most popular and object M being the least popular, then the probability that the ruth most popular object is requested is q, K m , m = 1, M, where 1 K= 1 1 2 . 1 M The Zipf distribution, which is characterized by ….

G. K. Zipf. Human Behavior and Principle of Least Effort: An Introduction to Human Ecology. Addison Wesley, Cambridge, MA, 1949.


A Classification Approach for Prediction of Target.. – Domeniconi, Perng..   (Correct)

….indicating its class. Sparse Document Vectors. From the large feature space, only a few words occur in a single document. Frequency Distribution of Words follows Zipf s Law. This implies that there is a small number of words that occurs very frequently while most words occur very infrequently [17]. It is possible to connect these properties of text classification tasks with the generalization performance of an SVM [8] In particular, the listed properties necessarily lead to large margin separation. Moreover, large margin, combined with low training error, is a sufficient condition for ….

Zipf, G.K.(1949). Human behavior and the principle of least effort: An introduction to human ecology . Addison-Wesley Press.


Analysis of Web Caching Architectures: Hierarchical.. – Rodriguez, Spanner.. (2001)   (8 citations)  (Correct)

….We consider that each document is requested independently from other documents, so we are neglecting any source of correlation between requests of different documents. Let fi I be the request rate from an institutional cache for all N documents, fi I = P N I;i . fi I is Zipf distributed [7] [35], that is, if we rank all N documents in order of their popularity, the i Gamma th most popular document has a request rate I;i given by I;i = fi I ; where ff is a constant that determines how skewed the Zipf distribution is, and oe is given by oe = Gamma1 The Zipf .

G. K. Zipf, Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology, Addison-Wesley, Reading, MA, 1949.


Learning a Monolingual Language Model from a Multilingual Text .. – Ghani, Jones (2000)   (Correct)

….models from multiple databases. They are motivated by the fact that word occurrences follow a highly skewed distribution, with a few words occurring very often, and most words occurring rarely. In the light of evidence suggesting that the important vocabulary words occur frequently in a database [5, 9, 13], it is probable that these words might be acquired by sampling. Callan et al. show that if queries can be run and documents retrieved, then it is possible to sample the contents of each database in a way that will produce an accurate language model for the database. We extend query based ….

G. K. Zipf. Human behavior and the principle of least effort: An introduction to human ecology. Addison-Wesley, Cambridge MA, 1949.


The Effects of Query-Based Sampling on Automatic.. – Callan, French.. (2000)   (2 citations)  (Correct)

….Database contents might be misrepresented, either deliberately or accidentally. What if complete language models are not actually necessary for accurate database selection Zipf s law indicates that in any text database half of the unique terms can be expected to occur just once or twice [19]. Such terms might be given high scores by a tf.idf model, but few people would argue that such terms accurately describe the contents of a database. Indeed, it is likely that many of the terms in a language model could be discarded without adversely affecting its descriptive power. Partial ….

G. K. Zipf. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Addison-Wesley, Reading, MA, 1949. 11


Indexing Multimedia Databases – Faloutsos   (Correct)

….Space overhead ffl Expensive insertions 32 data . data . new document document file . zoo . Aaron ffl STAIRS [IBM] ffl MEDLARS ffl DIALOG, ORBIT, LEXIS ffl refer lookbib [Les78] 33 Recent developments challenges: ffl skeweness of distribution (Zipf s law) Zip49] hybrid methods [FJ92a] adaptive postings lists [FJ92b] ffl huge indices; fast insertions Tomasic et al. TGMS94] Cutting and Pedersen [CP90] exploit skewness (short lists in B tree, long lists on a separate file) Zobel et al. [ZMSD92] use Elias s [Eli75] compression scheme for postings .

G.K. Zipf. Human Behavior and Principle of Least Effort: an Introduction to Human Ecology. Addison Wesley, Cambridge, Massachusetts, 1949.


Improving RAID Performance Using a Multibuffer Technique – Kien Hua Khanh   (Correct)

….access skew is zero. Access size skew : This is the skew condition of the data sizes reaquested by the I O operations. A larger access size skew indicates that most of the I O operations involve a large number of data blocks. We model the various skew conditions using a Zipf like distribution [8, 10]. For instance, to simulate an access skew of z, we used the following distribution function to determine the access probability P i of block i: P i = 1 i Z P n j=1 1 j Z ; where n is the number of blocks in the database, i.e. 10,000. Note that when z = 0, the distribution becomes a ….

G. K. Zipf. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Addison-Wesley, Reading, MA, 1949. 8


Pre-Admission Control for Movie-on-Demand Systems – Tavanapong, Hua, Sheu   (Correct)

….for all subscribers. We describe the characteristics of the requests as follows: ffl Interarrival times are modeled using Poisson distribution with interarrival rate . ffl The workload consists of V movie files with no replicas. Access frequency of a movie is modeled by a Zipf like distribution [17] as follows: f i = 1 i z P V j=1 1 j z , where 1 i V: V is the number of movie files in the system, and z is the skew factor. When z is zero, every movie is accessed with the same frequency. When z approaches one, some movies are accessed very frequently while others are rarely ….

G. K. Zipf. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Addison-Wesley, Reading, Mass., 1949.


BiHOP: A Bidirectional Highly Optimized Pipelining Technique.. – Hua, Wang, Sheu (1997)   (2 citations)  (Correct)

….playback of several video files. In terms of the workloads, each user request is characterized by an interarrival time and choice of object. User request interarrivals were modeled using a Poisson process. The access frequencies of objects in the database follow a Zipf like distribution [5, 6, 9]. Let n be the total number of requests for a simulation run. The number of requests for each object O i is determined as follows: R i = n i z Delta P v j=1 1=j z , where v is the number of objects in the system, and 0 z 1 is the skew factor. A larger z value corresponds to a more skew ….

G. K. Zipf. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Addison-Wesley, Reading, Mass., 1949.


Performance of Load Balancing Techniques for Join.. – Hua, Tavanapong, Lo   (Correct)

….is called a hash partition. We note that an initial partition consists of tuples from both operand relations; a hash partition contains only tuples from the skew relation. In our study, the size of each hash partition, say P i , is determined using the following Zipf like distribution function [26]: jP i j = jSj i Zp Delta P N j=1 1 j Zp , where N is the number of PNs, jSj denotes the number of tuples in relation S, and Z p is called the skew condition in this paper. Thus, we can have control of the partition skew, and therefore the imbalance condition during and after the hash ….

G. K. Zipf. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Addison-Wesley, Reading, Mass., 1949. 33


The Vocabulary Problem in Human-System Communication: .. – Furnas, Landauer.. (1987)   (106 citations)  (Correct)

….Vocabulary Problem in Human System Communication: an Analysis and a Solution G. W. Furnas T. K. Landauer L. M. Gomez S. T. Dumais Bell Communications Research E white.me. out [1] 2] 3] 4] 5] 6] 7] 8] 9] 10] 11] 12] 13] [14] 1. Introduction Many functions of most large systems depend on users typing in the right words. New or intermittent users often use the wrong words and fail to get the actions or information they want. This is the vocabulary problem. It is a troublesome impediment in computer interactions both ….

….Table 1a indicates that 22 subjects proposed the word change as a descriptor for the editing operation indicated by a crossed out word by an author, and referred to in the table as delete . All the tables were very sparse. One reason is that word usage tended to follow Zipf s distribution [14] a few words are used very frequently, the vast majority only rarely; more importantly however, most words are applied to only a few objects. Because they estimate the likelihood of users or designers giving a word for an object, these tables allow us to simulate the performance of various ….

Zipf, G.K., Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Addison-Wesley, Reading, MA, 1949. The Vocabulary Problem in Human-System Communication: an Analysis and a


Analysis of Web Caching Architectures: Hierarchical.. – Rodriguez, Spanner.. (2000)   (8 citations)  (Correct)

….We consider that each document is requested independently from other documents, so we are neglecting any source of correlation between requests of different documents. Let fi I be the request rate from an institutional cache for all N documents, fi I = P N i=1 I;i . fi I is Zipf distributed [7] [35], that is, if we rank all N documents in order of their popularity, the i Gamma th most popular document has a request rate I;i given by I;i = fi I oe i ff ; where ff is a constant that determines how skewed the Zipf distribution is, and oe is given by oe = N X i=1 1 i ff ) .

G. K. Zipf, Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology, Addison-Wesley, Reading, MA, 1949.


Scalable Content Distribution In The Internet – Rodriguez (2000)   (1 citation)  (Correct)

….3.3. POLLING RATE WITH CACHING INFRASTRUCTURE 71 request rate for the N documents. We order the documents so that document 1 is the most popular document and document N is the least popular. We suppose that the probability that a request is for the i th document is given by the Zipf distribution [94] [19] i.e. j i ff , where ff is the Zipf parameter and j is given by j = P N i=1 i Gammaff ) Gamma1 . We suppose that each document is requested independently from other documents. The request rate for the i th document therefore is fi j i ff . The parameter ff varies between 0:64 ….

….Assuming that requests for document i are uniformly distributed among all O 2H institutional caches, there are I;i Delta O 2H total requests for document i. Let fi I be the request rate from an institutional ISP for all N j documents, fi I = PN j i=1 I;i . fi I is Zipf distributed [19] [94] (see Section 3.3.2) We consider that newly appearing documents will also be Zipf distributed. That is, there will be some new appearing documents that will be very popular but there will also be many other new documents that will be requested by few clients. Let be the percentage of requests ….

G. K. Zipf, Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology, Addison-Wesley, Reading, MA, 1949.


Distributed Information Retrieval – Callan (2000)   (9 citations)  (Correct)

….Resource descriptions based on terms and their frequencies are generally a small fraction of the size of the original text database. The size is proportional to the number of unique terms in the database. Zipf s law indicates that the rate of vocabulary growth decreases as database size increases (Zipf, 1949), hence the resource descriptions for large databases are a smaller fraction of the database size than the resource descriptions for small databases. 4. RESOURCE SELECTION Given an information need and a set of resource descriptions, how is the system to select which resources to search The ….

….be sampled more efficiently; otherwise, random sampling is best. The language models for all three databases required about the same number of documents to converge. Database size and heterogeneity had little effect on the rate of convergence. This characteristic is consistent with Zipf s law (Zipf, 1949), which states that the rate at which new terms are found decreases with the number of documents examined. Zipf s law places no constraints on the order in which documents in a database are examined. Whether documents are selected sequentially or by query based sampling, only a relatively small ….

Zipf, G. K. (1949). Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Addison-Wesley, Reading, MA.


Analysis of Web Caching Architectures: Hierarchical and.. – Pablo Rodriguez.. (2001)   (8 citations)  (Correct)

….that each document is requested independently from other documents, so we are neglect 4 ing any source of correlation between requests of different documents. Let fi I be the request rate from an institutional cache for all N documents, fi I = P N i=1 I;i . fi I is Zipf distributed [4] [27], that is, if we rank all N documents in order of their popularity, the i Gamma th most popular document has a request rate I;i given by I;i = fi I oe i ff ; where ff is a constant that determines how skewed the Zipf distribution is, and oe is given by oe = N X i=1 1 i ff ) .

G. K. Zipf, Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology, Addison-Wesley, Reading, MA, 1949.


Learning a Monolingual Language Model from a Multilingual Text .. – Ghani, Jones (2000)   (Correct)

….databases. Query based sampling is motivated by the fact that word occurrences follow a highly skewed distribution, with a few words occurring very often, and most words occurring rarely. In the light of evidence suggesting that the important vocabulary words occur frequently in a database [5, 9, 13], it is probable that these words might be acquired by sampling. Callan et al. show that if queries can be run and documents retrieved, then it is possible to sample the contents of each database in a way that will produce an accurate language model for the database. We extend query based ….

G. K. Zipf. Human behavior and the principle of least effort: An introduction to human ecology. Addison-Wesley, Cambridge MA, 1949.


A Theoretical Framework For Abundance Distributions In Complex.. – Halloy   (Correct)

….right part of its range, e.g. when its left side is veiled. Lognormal distributions found in nature are generally canonical and are often veiled (May 1978; Preston 1980; Magurran 1988; Brown and Maurer 1989; Brown and Nicoletto 1991) Conversely, empirical data series fitted to a power function (Zipf 1949; Mandelbrot 1983; Magurran 1988) also exhibit a pronounced drop or convexity at the lower right side, a distribution which resembles an exponential function but often indicates lognormality. The distinction between exponential and lognormal can be seen in a frequency abundance representation, ….

….agents. The dynamics of diversification and abundance variations which occur in the resource attraction model do not occur in these physical systems. 6. 5 Convergence with other models It is no coincidence that many models produce polo distributions starting from many different explanations (e.g. Zipf 1949; John Conway s Game of Life, Berlekamp et al. 1982; Barlow 1994; Pahl Wostl 1995) as the trend results from similar fundamentals such as neighbourhood interaction or competition for resources. Bak (1997) suggested that such models evolve to criticality, as do natural systems (e.g. Lockwood and ….

Zipf, G.K. (1949) Human Behavior and the Principle of Least Effort – An introduction to human ecology, Addison-Wesley, Cambridge, Mass.


Performance Study of Satellite-linked Web Caches and.. – Xiao-Yu Hu Pablo (2000)   (1 citation)  (Correct)

….date: Dec. 19 23, 1998 threshold=10 times threshold=30 times threshold=50 times threshold=100 times Fig. 9. Comparison of HR and WHR reduction due to different filtering policies. threshold is used. The main reason for this results from the Zipf distribution of document requests [3] [16] where only few Web sites account for most of the requests and there is a large set of documents that have very few requests. 5.4 Complexity and disk requirements of filtering policies Up to now we have considered the performance of generic filtering policies based on previous requesting ….

G. K. Zipf, Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology, Addison-Wesley, Reading, MA, 1949.


Bringing the Web to the Network Edge: Large Caches and.. – Rodriguez, Biersack (2000)   (6 citations)  (Correct)

….that requests for document i are uniformly distributed among all O 2H institutional caches, there are I;i Delta O 2H total requests for document i. Let fi I;j be the request rate from an institutional ISP for all N j documents, fi I;j = PN j i=1 I;i . fi I;j is Zipf distributed [6] [33], that is, if we rank all N j documents in the Web in order of their popularity, the i Gamma th most popular document has a request rate I;i given by I;i = fi I;j oe i ff where ff takes values between 0:6 and 0:8 [6] and oe is given by oe = PN j i=1 1 i ff ) Gamma1 . We assume ….

G. K. Zipf, Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology, AddisonWesley, Reading, MA, 1949. 18


Bringing the Web to the Network Edge: Large Caches and.. – Pablo Rodriguez Ernst (2000)   (6 citations)  (Correct)

….( I;i Delta t) r r : Assuming that requests are uniformly distributed between all O 2H institutional caches, there are I;i Delta O 2H total requests for document i. Let fi I be the request rate from a local ISP for all N j documents , fi I = PN j i=1 I;i . fi I is Zipf distributed [6] [32] that is, if we rank all N j documents in the Web in order of their popularity, the i Gamma th most popular document has a request rate I;i given by I;i = fi I oe i ff where ff takes values between 0:6 and 0:8 [6] and oe is given by oe = N j X i=1 1 i ff ) Gamma1 : We assume that .

G. K. Zipf, Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology, AddisonWesley, Reading, MA, 1949.


Web Caching Architectures: Hierarchical and Distributed.. – Rodriguez, Spanner.. (1999)   (26 citations)  (Correct)

….i. Let fi I be the request rate from a institutional ISP for all N documents , fi I = P N i=1 I;i . Let fi tot = fi I Delta O 2H be the total request rate from all the O 2H institutional caches for all documents. The request rate for the documents in the Web is Zipf distributed [4] [23] that is, if we rank all N documents in the Web in order of their popularity, the i Gamma th most popular document has a request rate tot;i given by tot;i = fi tot oe i ff where ff takes values between 0:6 and 0:8 [4] and oe is given by oe = N X i=1 1 i ff ) Gamma1 : We assume .

G. K. Zipf, Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology, Addison-Wesley, Reading, MA, 1949.


Interactive Video Streaming with Proxy Servers – Reisslein, Hartanto, Ross (1999)   (11 citations)  (Correct)

….We show that replication and striping of popular objects in the proxy significantly improve the hit rate and throughput of the proxy as well as the user perceived media quality. Throughout our performance study we assume that the requests for continuous media objects follow the Zipf distribution [28]. Specifically, if there are M objects, with object 1 being the most popular and object M being the least 7 popular, then the probability that the mth most popular object is requested is q m = K=m i ; m = 1; M; where K = 1 1 1=2 i Delta Delta Delta 1=M i : The Zipf ….

G.K. Zipf, Human Behavior and Principle of Least Effort: An Introduction to Human Ecology, Addison–Wesley, Cambridge, MA, 1949.


On Power-Law Relationships of the Internet Topology – Faloutsos, Faloutsos.. (1999)   (320 citations)  (Correct)

….as the human respiratory system [12] with a scaling factor of 2:9, and automobile networks [6] with an exponent of 1:6. Second, power laws are obeyed in diverse settings, like income distribution (the Pareto law ) and the frequency distribution of words in natural text (the Zipf distribution [28]) 6 Conclusions Our main contribution is a novel way to study the Internet topology, namely through power laws. These powerlaws capture concisely the highly skewed distributions of the graph properties and quantify them by single numbers, the power law exponents. Our contributions can be ….

G.K. Zipf. Human Behavior and Principle of Least Effort: An Introduction to Human Ecology. Addison Wesley, Cambridge,


Interactive Video Streaming with Proxy Servers – Reisslein, Hartanto, Ross (2000)   (11 citations)  (Correct)

….loss . IV. REPLICATION AND STRIPING OF VIDEO OBJECTS In this section we study the impact of the placement of video objects in the proxy s disk array on the proxy s performance. Throughout our performance study we assume that the requests for continuous media objects follow the Zipf distribution [23]. Specifically, if there are M objects, with object 1 being the most popular and object M being the least popular, then the probability that the mth most popular object is requested is q m = K=m i ; m = 1; M; where K = 1 1 1=2 i Delta Delta Delta 1=M i : The Zipf ….

G.K. Zipf, Human Behavior and Principle of Least Effort: An Introduction to Human Ecology, Addison–Wesley, Cambridge, MA, 1949.


Challenges for Tertiary Storage in Multimedia Servers – Chervenak (1998)   (6 citations)  (Correct)

….probability of requesting the nth most popular of M movies is C=n; where C = 1= 1 1=2 1=3 : 1=M) This distribution is highly localized, with 90 of requests going to the most popular 10 of movies. The Zipf distribution is widely used for characterizing requests to multimedia servers [42] [8] 3 Write Once Optical Disk In this section, we examine the use of optical disk drives, focusing particularly on consumer products such as CD ROM and DVD as components of a large multimedia server. We begin by examining the properties of write once optical media and drives. Then we present ….

G.K. Zipf. Human Behavior and Principle of Least Effort: an Introduction to Human Ecology. Addison Wesley, Cambridge, Massachusetts, 1949.


Striping for Interactive Video: Is it Worth it? – Reisslein, Ross, Shrestha (1999)   (Correct)

….we assume that the user demand for videos varies from video to video. Specifically, if there are M videos with video 1 being the most popular and video M being the least popular, then the probability that the mth most popular video is requested by a user is given by the Zipf distribution [1]: q m = K=m i ; m = 1; M; where K = 1 1 1=2 i Delta Delta Delta 1=M i : The Zipf distribution corresponds to a highly localized user request pattern that has been typical at movie rental stores. Note that the Zipf distribution depends on a parameter i 0. Increasing i ….

G. K. Zipf, Human Behavior and Principle of Least Effort: An Introduction to Human Ecology, Addison–Wesley, Cambridge, MA, 1949.


Selectivity Estimation of Window Queries for Line Segment.. – Guido Proietti (1998)   (2 citations)  (Correct)

….one typically makes the uniformity and independence assumption on them. Unfortunately, these assumptions do not hold for real datasets and generally lead to pessimistic results [3] Whereas for one dimensional data some developed nonuniform distributions (like for example the Zipf distribution [14]) have met with success, for multi dimensional data difficulties have not been overcome yet. In fact, some proposed non uniform model (such as, for instance, clustering ad hoc methods [1, 11] are not flexible enough to be applied to a large variety of data. Recently, the introduction of the ….

G.K. Zipf. Human behavior and principle of least effort: an introduction to human ecology. Addison Wesley, Cambridge, MA, 1949.


Density Biased Sampling: An Improved Method for Data Mining.. – Palmer, Faloutsos (1999)   (11 citations)  (Correct)

….as was shown in the example. Instead, it seems more likely that cluster sizes will follow a Zipf distribution. Zipf distributions occur extremely frequently in practice: they have been found in the frequency distribution of vocabulary words in text (English and Latin works of literature [24]; Bible [6] the distribution of city populations [24] distribution of first and last names of people [5] sales patterns [6] income distributions (the Pareto law [20] and distribution of web site hits [13] The main contribution of this paper is to introduce a new sampling technique and an .

….more likely that cluster sizes will follow a Zipf distribution. Zipf distributions occur extremely frequently in practice: they have been found in the frequency distribution of vocabulary words in text (English and Latin works of literature [24] Bible [6] the distribution of city populations [24]; distribution of first and last names of people [5] sales patterns [6] income distributions (the Pareto law [20] and distribution of web site hits [13] The main contribution of this paper is to introduce a new sampling technique and an efficient algorithm that improves on uniform sampling .

G.K. Zipf. Human Behavior and Principle of Least Effort: An Introduction to Human Ecology. Addison Wesley, Cambridge, Massachusetts, 1949.


Striping for Interactive Video: Is it Worth it? – Reisslein, Ross, Shrestha (1999)   (Correct)

….analysis we assume that the user demand for videos varies from video to video. Specifically, if there are M videos with video 1 being the most popular and video M being the least popular, then the probability that the mth most popular video is requested by a user is given by the Zipf distribution [11]: q m = K=m i ; m = 1; M; where K = 1 1 1=2 i Delta Delta Delta 1=M i The Zipf distribution corresponds to a highly localized user request pattern that has been typical at movie rental stores. Note that the Zipf distribution depends on a parameter i 0. Increasing i ….

G. K. Zipf. Human Behavior and Principle of Least Effort: An Introduction to Human Ecology. Addison–Wesley, Cambridge, MA, 1949.


Scalable Feature Selection, Classification and.. – Chakrabarti, Dom.. (1998)   (5 citations)  (Correct)

….how to use the class models to extract context sensitive document signatures. 3.1 Document model There have been many proposals for statistical models of text generation. One of the earliest indicators of the power of simple rules derived from both quantitative and textual data is Zipf s law [48]. The models most frequently used in the IR community are Poisson and Poisson mixtures [37, 42] If X is distributed Poisson with rate , denoted X # P ( then Pr[X = x] e x x and if Y is distributed Bernoulli with n trials and mean np, denoted Y # B (n, p) then Pr[Y = y] # ….

Zipf GK (1949) Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Addison-Wesley, Reading, Mass


Bit-Sliced Signature Files for Very Large Text Databases.. – Panagopoulos, Faloutsos (1994)   (5 citations)  (Correct)

….trend in commercial workstations. A response time of 2sec can be expected from a Sparc like architecture with a main memory buffer of 16MB, operating on a 5GB database (figure 6) Future research could examine (a) the effects of skewness in the frequencies of query terms (e.g. Zipf distribution [Zip49]) b) the parallelization of vertical [LF92] and horizontal [SD83, LL89] partitioning signature methods, and (c) the parallelization of hybrid methods that combine signature retrieval with inverted indices [FJ91] .

G. K. Zipf. Human Behavior and Principle of Least Effort: An Introduction to Human Ecology. Addison–Wesley, Cambridge, MA, 1949.


Automatic Discovery of Language Models for Text Databases – Callan, Connell, Du (1999)   (42 citations)  (Correct)

….for learning an accurate language model of the entire database. It is an open question how large a sample is required to construct language models of a specified accuracy. Word occurrences follow a highly skewed distribution, with a few words occurring very often, and most words occurring rarely [16]. Words in the middle of the frequency range are thought to be the most useful for distinguishing among documents within a single database [10] There is also evidence that highly frequent words may be useful for distinguishing among databases [3] These bits of evidence suggest that the important ….

….such as co occurrence based query expansion (Section 8) One important piece of information that appears difficult to acquire by sampling is the size of the database. Zipf s law and empirical evidence show that vocabulary growth slows, but does not stop, as additional documents are seen [16, 9], and that this rate is independent of database size. Hence it is unclear how to estimate database size by sampling. Database size is primarily used by selection algorithms to scale the word frequencies in language models provided for databases of varying sizes. It is likely that a similar effect ….

[Article contains additional citation context not shown here]

G. K. Zipf. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. AddisonWesley, Reading, MA, 1949.


Dynamic Load Balancing in Hierarchical Parallel Database.. – Bouganim, Florescu.. (1996)   (10 citations)  (Correct)

….skew in the production of trigger activations and in all operators producing pipelined tuples. For simplicity, the skew factor of a producer operator does not impact that of the conFigure 7: Impact of data skew on DP sumer operator. All operators have the same skew factor based on a Zipf function [Zip49] that yields a factor between 0 (no skew) and 1 (high skew) Figure 7 shows the relative performance of DP versus the skew factor with 64 processors, the reference response time being that with no skew. The important conclusion is that the impact of skew on our model is insignificant. This is due ….

G. K. Zipf, “Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology“. Reading, MA, Addison-Wesley, 1949.


Document Filtering With Inference Networks – Callan (1996)   (17 citations)  (Correct)

….speed is affected by the number of profiles, because inverted lists are built only for terms in the profile term dictionary. As more profiles are added, the vocabulary grows larger. Fortunately, adding a large number of profiles causes only a small increase in the size of the term dictionary [16], and therefore only a small decrease in document parsing speed. 3.4 Comparing a Document to Profiles After a document is indexed, it can be compared to a clipset. Retrospective document retrieval systems owe their speed partially to indexing methods, such as inverted lists, that enable the ….

G. K. Zipf. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. AddisonWesley, Reading, MA, 1949.


Choosing the Best Storage System for Video Service – Chervenak, Patterson, Katz (1995)   (25 citations)  (Correct)

….data available on the frequency of these operations. For simplicity, we ignore them in this study. Based on video store rental patterns, we assume that accesses to movies in a video server will be highly localized, with a small number of movies receiving most of the accesses. We use Zipf s Law [27] to characterize this locality. Zipf s Law states that the probability of choosing the nth most popular of M movies is C n, where C = 1 (1 1 2 1 3 . 1 M) For example, using this distribution, the 5th most popular movie is requested one fifth as often as the most popular movie. We ….

G.K. Zipf. Human Behavior and Principle of Least Effort: an Introduction to Human Ecology. Addison Wesley, Cambridge, Massachusetts, 1949.


Continuous speech recognition in the WAXHOLM dialogue system – Ström (1996)   (Correct)

….lexicon does not occur in the training data even once is not uncommon. This leads to the conclusion that an a priori distribution for the word frequencies is necessary. Zipf s law states that the logarithm of the word frequency is approximately proportional to the logarithm of the rank of the word (Zipf, 1949; Pierce, 1961; Li, 1992) where rank is defined such that the Nth most frequent word has rank N. We get: log log p R R p = r for some constant r,where R is the rank and p is the word frequency. If the word class is large then the difference in word frequency is small between two ….

Zipf GK (1949). Human Behavior and Principle of Least Effort: An Introductiont to Human Ecology.


Web Caching Architectures: Hierarchical and Distributed Caching – Pablo Rodriguez   (26 citations)  (Correct)

….removed from the caches every Delta seconds. Requests from an institutional cache for document i, 1 i N , are Poisson distributed with average request rate I;i . Let fi I be the request rate from an institutional cache for all N documents, fi I = P N i=1 I;i . fi I is Zipf distributed [4] [21], that is, if we rank all N documents in order of their popularity, the i Gamma th most popular document has a request rate I;i given by I;i = fi I oe i ff where ff takes values between 0:6 and 0:8 [4] and oe is given by oe = N X i=1 1 i ff ) Gamma1 : Assuming that requests for .

G. K. Zipf, Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology, Addison-Wesley, Reading, MA, 1949.


Using Taxonomy, Discriminants, and Signatures for.. – Soumen Chakrabarti (1997)   (2 citations)  (Correct)

….we will present in detail the techniques that make possible the capabilities mentioned before. 3.1 Document model There have been many proposals for statistical models of text generation. One of the earliest indicators of the power of simple statistical tests on term frequencies is Zipf s law [38]. The models most frequently used in the IR community are Poisson and Poisson mixtures [28, 33] If X is distributed Poisson with rate , denoted X P( then Pr[X = x] e Gamma x =x and if Y is distributed Bernoulli with n trials and mean np, denoted Y B(n; p) then Pr[Y = y] ….

G. K. Zipf. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Addison-Wesley, 1949.


Selectivity Estimation of Window Queries for Line Segment.. – Proietti, Faloutsos (1998)   (2 citations)  (Correct)

….one typically makes the uniformity and independence assumption on them. Unfortunately, these assumptions do not hold for real datasets and generally lead to pessimistic results [3] Whereas for one dimensional data some developed non uniform distributions (like for example the Zipf distribution [14]) have met with success, for multi dimensional data difficulties have not been overcome yet. In fact, some proposed non uniform model (such as, for instance, clustering ad hoc methods [11, 1] are not flexible enough to be applied to a large variety of data. Recently, the introduction of the ….

G.K. Zipf. Human behavior and principle of least effort: an introduction to human ecology. Addison


Analysis of Web Caching Architectures: – Hierarchical And Distributed   (Correct)

No context found.

G. K. Zipf, Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Reading, MA: Addison-Wesley, 1949.


TAPER: A Two-Step Approach for All-strong-pairs Correlation.. – Hui Xiong Student   (Correct)

No context found.

G. K. Zipf. Human Behavior and Principle of Least Effort: An Introduction to Human Ecology. Addison Wesley Press, Cambridge, Massachusetts, 1949.


Bringing the Web to the Network Edge: – Large Caches And   (Correct)

No context found.

G. K. Zipf, Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology, AddisonWesley, Reading, MA, 1949.


Probabilistic Model for Contextual Retrieval – Ji-Rong Wen Jrwen   (Correct)

No context found.

Zipf, G.K., Human Behavior and Principle of Least Effort: an Introduction to Human Ecology, Addison Wesley, Cambridge, MA, 1949


Page Replacement Algorithm for Buffer Management – In The Omega   (Correct)

No context found.

Zipf G.K. Human Behavior and the Principle of Least Effort: an Introduction to Human Ecology. Reading, MA, Addison-Wesley, 1949.


Generating Referring Expressions in a Multimodal Context.. – van der Sluis, Krahmer (2001)   (1 citation)  (Correct)

No context found.

Zipf, G.K. (1949), Human behavior and the principle of least effort: An introduction to human ecology, Addison-Wesley, Cambridge.


Performance Analysis Of Tape Libraries For Supercomputing.. – Hamzaoglu, Simitci   (Correct)

No context found.

G. K. Zipf, “Human Behavior and Principle of Least Effort: An Introduction to Human Ecology“, Addison Wesley, 1949.

Online articles have much greater impact   More about CiteSeer   Add search form to your site   Submit documents   Feedback  

CiteSeer – Copyright NEC and IST

Advertisements

Entry filed under: Uncategorized.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Trackback this post  |  Subscribe to the comments via RSS Feed


Calendar

June 2005
M T W T F S S
« May   Jul »
 12345
6789101112
13141516171819
20212223242526
27282930  

Tweets


%d bloggers like this: