title: Building a Distributed Full-Text Index for the Web creator: Melnik, Sergey creator: Raghavan, Sriram creator: Yang, Beverly creator: Garcia-Molina, Hector subject: Databases and the Web subject: Digital Libraries description: We identify crucial design issues in building a distributed inverted index for a large collection of web pages. We introduce a novel pipelining technique for structuring the core index-building system that substantially reduces the index construction time. We also propose a storage scheme for creating and managing inverted files using an embedded database system. We propose and compare different strategies for addressing various issues relevant to distributed index construction. Finally, we present performance results from experiments on a testbed distributed indexing system that we have implemented. publisher: Stanford date: 2000-07 type: Techreport type: NonPeerReviewed format: application/pdf identifier: http://ilpubs.stanford.edu:8090/448/1/2000-29.pdf identifier: Melnik, Sergey and Raghavan, Sriram and Yang, Beverly and Garcia-Molina, Hector (2000) Building a Distributed Full-Text Index for the Web. Technical Report. Stanford. relation: http://ilpubs.stanford.edu:8090/448/