Agents for personalized client-side information gathering from the Web

Somlo, Gabriel L., author; Howe, Adele E., advisor

Agents for personalized client-side information gathering from the Web

dc.contributor.author	Somlo, Gabriel L., author
dc.contributor.author	Howe, Adele E., advisor
dc.date.accessioned	2026-02-23T19:19:20Z
dc.date.issued	2005
dc.description.abstract	We present the design, implementation, and evaluation of a personalized Web information gathering agent, intended to address several shortcomings of today's centralized search engines. The potential privacy issues are addressed by a standalone client-side implementation, placing the agent under its users' administration. For personalization, we build on current text filtering and machine learning research, enabling the agent to adapt to its users' dynamic information needs. We also explore the tradeoff between performance and user friendliness, which arises due to the limited resources available to a client-side implementation. As a key improvement over existing Web agents, we separate the treatment of relevance prediction from that of document gathering, and approach each problem using the most appropriate tools. For relevance prediction, we compare two main classes of text filtering algorithms: TF-IDF (for term frequency, inverse document frequency), which measures term-count distributions within and across documents, and Bayesian, which learns individual term contributions to the overall probability of relevance. Several versions of these algorithms are implemented to assess how performance is impacted by factors such as the amount of training, availability of negative feedback, and availability of topic-labeled training samples. For document gathering, we offload the brute-force work to a large centralized search engine (e.g., Google), and focus on higher-level techniques, including generation of search queries from the agent's user profile, change detection between subsequent document versions, and tracking persistent user interests over time. We approach the problem of evaluating Web information agents from two perspectives. We use benchmark datasets for speed, convenience, and statistical significance. We also conduct user studies to assess how the aggregate system behaves in a live situation, with limited training from real users. Our main conclusions are that it is possible to build high-performance, lightweight text filters, especially when the operating environment facilitates users providing negative feedback; fast and efficient methods exist to detect whether a document changes in a relevant way or is made redundant by a previous document; and that relevant search engine queries can easily be extracted from a TF-IDF profile and used to supplement the incoming document stream for significantly improved recall.
dc.format.medium	doctoral dissertations
dc.identifier.uri	https://hdl.handle.net/10217/243451
dc.identifier.uri	https://doi.org/10.25675/3.026279
dc.language	English
dc.language.iso	eng
dc.publisher	Colorado State University. Libraries
dc.relation.ispartof	2000-2019
dc.rights	Copyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright.
dc.rights.license	Per the terms of a contractual agreement, all use of this item is limited to the non-commercial use of Colorado State University and its authorized users.
dc.subject	computer science
dc.title	Agents for personalized client-side information gathering from the Web
dc.type	Text
dcterms.rights.dpla	This Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
thesis.degree.discipline	Computer Science
thesis.degree.grantor	Colorado State University
thesis.degree.level	Doctoral
thesis.degree.name	Doctor of Philosophy (Ph.D.)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: ETDF_PQ_2005_3200699.pdf
Size:: 4.44 MB
Format:: Adobe Portable Document Format

Download

Collections

2000-2019
Theses and Dissertations