Repository logo

Agents for personalized client-side information gathering from the Web

dc.contributor.authorSomlo, Gabriel L., author
dc.contributor.authorHowe, Adele E., advisor
dc.date.accessioned2026-02-23T19:19:20Z
dc.date.issued2005
dc.description.abstractWe present the design, implementation, and evaluation of a personalized Web information gathering agent, intended to address several shortcomings of today's centralized search engines. The potential privacy issues are addressed by a standalone client-side implementation, placing the agent under its users' administration. For personalization, we build on current text filtering and machine learning research, enabling the agent to adapt to its users' dynamic information needs. We also explore the tradeoff between performance and user friendliness, which arises due to the limited resources available to a client-side implementation. As a key improvement over existing Web agents, we separate the treatment of relevance prediction from that of document gathering, and approach each problem using the most appropriate tools. For relevance prediction, we compare two main classes of text filtering algorithms: TF-IDF (for term frequency, inverse document frequency), which measures term-count distributions within and across documents, and Bayesian, which learns individual term contributions to the overall probability of relevance. Several versions of these algorithms are implemented to assess how performance is impacted by factors such as the amount of training, availability of negative feedback, and availability of topic-labeled training samples. For document gathering, we offload the brute-force work to a large centralized search engine (e.g., Google), and focus on higher-level techniques, including generation of search queries from the agent's user profile, change detection between subsequent document versions, and tracking persistent user interests over time. We approach the problem of evaluating Web information agents from two perspectives. We use benchmark datasets for speed, convenience, and statistical significance. We also conduct user studies to assess how the aggregate system behaves in a live situation, with limited training from real users. Our main conclusions are that it is possible to build high-performance, lightweight text filters, especially when the operating environment facilitates users providing negative feedback; fast and efficient methods exist to detect whether a document changes in a relevant way or is made redundant by a previous document; and that relevant search engine queries can easily be extracted from a TF-IDF profile and used to supplement the incoming document stream for significantly improved recall.
dc.format.mediumdoctoral dissertations
dc.identifier.urihttps://hdl.handle.net/10217/243451
dc.languageEnglish
dc.language.isoeng
dc.publisherColorado State University. Libraries
dc.relation.ispartof2000-2019
dc.rightsCopyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright.
dc.rights.licensePer the terms of a contractual agreement, all use of this item is limited to the non-commercial use of Colorado State University and its authorized users.
dc.subjectcomputer science
dc.titleAgents for personalized client-side information gathering from the Web
dc.typeText
dcterms.rights.dplaThis Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
thesis.degree.disciplineComputer Science
thesis.degree.grantorColorado State University
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy (Ph.D.)

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ETDF_PQ_2005_3200699.pdf
Size:
4.44 MB
Format:
Adobe Portable Document Format