## Permanent URI for this collection

## Browse

### Recent Submissions

Publication Graph Properties from Restricted Information(2024-05) Sengupta, RikThere are several settings in theoretical computer science where we gain some form of limited access to an unknown underlying graph (typically through a subgraph), and we try to infer something fundamental about the entire graph. This question of learning a structure from partial information is the cornerstone of many active areas of research. In this thesis, we study two very different models for learning graph properties. We show some surprising commonalities and differences in these two settings. In the first, probabilistic setting, we ask: suppose we have a fixed, unknown graph, which is passed through a ``noisy'' channel, that deletes the vertices with some uniform probability, deletes (or flips) the remaining edges with some uniform probability, and returns the resulting subgraph. How many such samples (or ``traces'') would we need to reconstruct the original graph? We show several results in this setting, including separations between random and arbitrary graphs, and vertex and edge deletions, even when all but an o(1)-fraction of the graph disappears in every trace. In the second, deterministic setting, we ask: how can we identify graphs efficiently using logical characterizations? This question is inextricably tied with complexity theory. A technique we have for showing lower bounds in these realms is by means of two-player logical games, where two perfect players (called ``Spoiler'' and ``Duplicator'') play pebble games on graphs and can only argue about the restricted ``views'' given by the induced subgraphs on those pebbles. We generalize the known games into a flexible framework characterizing any reasonable syntactic measure; we also play these games on canonical ordered structures such as linear orders and strings, a notoriously difficult endeavor historically. Here we prove some new bounds, some of which we show to be optimal. Despite being from quite distinct areas of research, the two problems show several notable similarities. They both ask about the fundamental structures of graphs, while having access only to partial ``views'' of the graph. They both rely on distinguishing a given pair of graphs as the main subroutine, and thereafter draw conclusions about identifying entire classes of graphs. They both exploit isomorphism testing as an intermediate subroutine. They both exhibit bottlenecks in the form of ``near-isomorphisms'' that are not true isomorphisms. And finally, they both show a remarkable phenomenon in which there being an inherent *ordering* in the universe makes the question provably easier than there being no such ordering.Publication Exploring Human-Centered AI Storytelling(2024-05) Akoury, Nader SLarge language models (LLMs) have ushered in a multitude of new language generation capabilities, bringing AI-guided storytelling much closer to reality. Authors can now meaningfully engage with AI writing assistants to help during the story writing process. These same capabilities have also enabled the ability to add more diverse and complex interactions for narrative-driven role-playing games. Though due to the memory and processing that LLMs require, efficient inference techniques are needed as video games are real-time systems with demanding performance requirements. In this thesis I explore two overarching lines of inquiry: (1) What is the user experience with systems designed for language generation in storytelling and video games? and (2) How can we improve these systems to address limitations that adversely affect user experience? I investigate these questions through the lens of AI story writing assistants and video game dialogue systems. In Chapter 2 I discuss my explorations using LLMs as AI story writing assistants on the online collaborative writing platform Storium, where real authors on the Storium platform can query a model for suggested story continuations. Then in Chapter 3, I extract dialogue from the widely-acclaimed role-playing game Disco Elysium: The Final Cut, which contains 1.1M words of dialogue spread across a complex graph of utterances where node reachability depends on game state. Using a reusable research artifact — a web app that recreatse the dialogue system from the game — I explore real players’ experiences interacting with the game augmented by LLM-generated dialogue. In a natural follow-up, in Chapter 4 I examine how to enhance the player experience, while maintaining game’s existing structure, by introducing a virtual game master (GM) that allows players to type their desired response in a conversation, rather than choose from a set of pre-written options. To address the response time considerations of these real-time systems, in Chapter 5 I investigate efficient inference for the Transformer architecture by incorporating linguistic features into the decoding process. I conclude with Chapter 6, where I consider promising future directions for improving the virtual GM and ways for integrating LLMs into the video game dialogue writing process.Publication Retrieval Augmented Representation Learning for Information Retrieval(2024-05) Hashemi, HeliaInformation retrieval (IR) is a scientific discipline within the fields of computer and information sciences that enables billions of users to efficiently access the information they need. Applications of information retrieval include, but are not limited to, search engines, question answering, and recommender systems. Decades of IR research have demonstrated that learning accurate query and document representations plays a vital role in the effectiveness of IR systems. State-of the-art representation learning solutions for information retrieval heavily rely on deep neural networks. However, despite their effective performance, current approaches are not quite optimal for all IR settings. For example, information retrieval systems often deal with inputs that are not clear and self-sufficient, e.g., many queries submitted to search engines. In such cases, current state-of-the-art models cannot learn an optimal representation of the input or even an accurate set of all representations. To address this major issue, we develop novel approaches by augmenting neural representation learning models using a retrieval module that guides the model towards learning more effective representations. We study our retrieval augmentation approaches in a diverse set of somewhat novel and emerging information retrieval ap plications. First, we introduce Guided Transformer—an extension to the Transformer network that adjusts the input representations using multiple documents provided by a retrieval module—and demonstrate its effectiveness in learning representations for conversational search problems. Next, we propose novel representation learning models that learn multiple representations for queries that may carry multiple intents, including ambiguous and faceted queries. For doing so, we also introduce a novel optimization approach that enables encoder-decoder architectures to generate a per mutation invariant set of query intents. Furthermore, we study retrieval-augmented data generation for domain adaptation in IR, which concerns applying a retrieval model trained on a source domain to a target domain that often suffers from unavailability of training data. We introduce a novel adaptive IR task, in which only a textual description of the target domain is available. We define a taxonomy of domain attributes in information retrieval to identify different properties of a source domain that can be adapted to a target domain. We introduce a novel automatic data construction pipeline for adapting dense retrieval models to the target domain. We believe that the applications of the developed retrieval augmentation methods can be expanded to many more real-world IR tasks.Publication Unlocking Natural Language Generalization with Adaptive Retrieval-based Methods(2024-05) Drozdov, AndrewProgress in large language model (LLM) training and inference has contributed to the emergence of ``generative retrieval'' as a new sub-topic in the field of artificial intelligence. Generative retrieval encapsulates a family of methods that leverages the strengths of generative language models for information retrieval applications. These methods have particular utility when embedded in natural language interfaces such as conversational chatbots, which represent an extreme diversity of fine-grained tasks that require both the ability for models to quickly adapt and to generate fluent and relevant responses. In this dissertation, I propose three general methods to further advance the capabilities of generative retrieval systems:

1. I introduce a method for effective adaptation of large language models for retrieval through in-context learning. This technique leverages task-specific demonstrations to quickly learn to rank candidate passages. The criteria for demonstration selection is based on ``demonstration difficulty'' and is inspired by gradient-based learning, where difficult and informative data points often lead to higher magnitude gradients.

2. Generative retrieval enables a massive variety of tasks, including retrieval over structured data. Inspired by previous methods for learning compositional structure with recursive computation, I develop a novel extension of least-to-most prompting that dynamically selects demonstrations to cover the many aspects of the input query. This novel approach leads to state-of-the-art results on a challenging compositional generalization benchmark translating text to a SQL-like query language.

3. Retrieving relevant documents from an external datastore is an effective way for language models to automatically ground their predictions externally rather than solely rely on their internal memory. I design an adaptive algorithm that discards distracting or irrelevant documents, and more heavily weights the influence of relevant text. The more precise usage of the datastore leads to state-of-the-art performance on a language modeling benchmark for generating encyclopedic text.

4. Throughout retrieval augmented generation (RAG) many atomic facts are generated that pertain only to a subset of retrieved passages, leading to inefficient usage of the limited prompt context. I introduce a Retrieval-Driven Memory Manager (ReDMM) for RAG that adaptively selects which passages to include at each step of generation, bypassing context length limits. ReDMM is particularly helpful for generating complex answers as measured by a suite of benchmarks for long-form question answering.

I demonstrate that these methods address limitations of previous generative retrieval systems and provide a path forward for more effective language model use.Publication Efficient k-Nearest Neighbor Search with Black-Box Neural Similarity Functions(2024-05) Yadav, NishantThe goal of k-nearest neighbor (k-NN) search is to ﬁnd the top-k similar items for a query under a given similarity function. k-NN search is a widely-used sub-routine in search, recommendation, question-answering and many other applications in machine learning, and information retrieval systems to improve performance and robustness of models and adapt models to new domains. For many of these applications, the state-of-the-art query-item similarity models are black-box neural similarity functions such as cross-encoders that jointly encode a query-item pair to directly output a scalar similarity. However, unlike vector-based similarity functions (e.g., inner product), computing a single query-item score using cross-encoders can be computationally expensive as cross-encoders are typically parameterized using deep neural models such as transformers. For this reason, existing approaches perform k-NN search with cross-encoders using (heuristic) retrieve-and-rerank approaches that perform retrieval with a separate model (such as dual-encoder or BM25) followed by re-ranking using the cross-encoder. In this thesis, we propose efﬁcient matrix factorization-based approaches for ﬁtting query and item embeddings to approximate cross-encoder scores for query-item pairs, and use the approximate scores to perform k-NN search with cross-encoders. First, we introduce, ANNCUR, a CUR-decomposition-based method that computes latent item embeddings by comparing each item with a set of anchor queries. At test-time, ANNCUR computes test query embedding by comparing the test query with only a small set of anchor items chosen uniformly at random and performs retrieval using the approximate scores computed using dot-product of query and item embeddings. We next propose ADACUR that further improves the test-time recall-vs-cost trade-offs by comparing the test query with an incrementally and adaptively chosen set of anchor items, conditioned on the test query. The indexing step for ANNCUR and ADACUR computes a dense matrix by scoring all items against a set of anchor/train queries. With the goal of reducing the indexing complexity, in order to scale to millions of items, we propose methods based on factorization of sparse matrices containing cross-encoder scores between a set of train queries and all the items. Our proposed methods are signiﬁcantly more efﬁcient than CUR-based approaches at indexing the set of items and allow for efﬁciently leveraging existing dual-encoder models while avoiding expensive distillation-based training of dual-encoders. We perform k-NN search for a given query using the approximate scores with an adaptive search strategy that performs retrieval over multiple rounds and uses the feedback on retrieved items to improve the cross-encoder approximation and hence retrieval in subsequent rounds. Empirically, our proposed approaches provide signiﬁcant improvements in recall-vs-latency tradeoffs over existing retrieve-and-rerank pipelines on zero-shot entity linking and information retrieval benchmarks. In the ﬁnal chapter, we propose various loss functions and strategies for training cross-encoder models to improve k-NN search performance of the proposed methods by making the resulting cross-encoder score matrix easier to approximate without affecting the accuracy of the cross-encoder similarity on downstream tasks. We also use proposed k-NN search methods to dynamically sample negative items/examples while training cross-encoders to improve the robustness of cross-encoder models on downstream tasks.Publication FAST LINEAR ALGEBRA FOR GAUSSIAN PROCESSES(2024-05) Yadav, MohitIn machine learning, uncertainty calibration, and prediction interpretability are crucial. Gaussian processes (GPs) are widely recognized for their ability to model uncertainty and interoperability. However, their practical application is often limited by the computational intensity of operations like matrix inversion and determinant computation, which scale cubically with the number of data points. This thesis focuses on developing fast algorithms to tackle this computational challenge for GPs and enhance their scalability for large-scale datasets.

The first two chapters focus on the structured kernel interpolation (SKI) framework, which interpolates the kernel matrix using a dense grid in the input space and exploits iterative algorithms to accelerate GPs. First, we present a novel and fast iterative algorithm for performing GP inference within the SKI framework. After an initial linear computation in the number of data points, we show that the remaining computation sequence scales independently of the dataset size. Our method speeds up GP inference for several low-dimensional datasets.

Unfortunately, SKI’s scalability diminishes as the grid size exponentially increases with increasing input point dimensions. To mitigate this, we integrate sparse grids with the SKI framework, owing to their accurate interpolation and size growing more slowly than dense grids as the number of dimensions rises. Next, we introduce a novel matrix-vector multiplication algorithm for sparse grid kernel matrices, improving SKI’s scalability to higher dimensions. For example, we can scale GP inference to eleven dimensions with over five million points.

The final chapter explores GPs in bandit algorithms for optimizing the ranking of top-k items on platforms like online marketplaces and search engines. We introduce a contextual bandit algorithm using GPs with Kendall kernels, which sidesteps the restrictive assumptions typically required for reward feedback, addressing the challenge of many options. Additionally, we develop a quick algorithm for linear algebraic operations on the kernel matrix for top-k rankings, utilizing a sparse representation of the Kendall kernel. This method reduces inference time, leading to faster bandit algorithms with reduced latency.Publication Sublinear Algorithms for Matrices: Theory and Applications(2024-05) Ray, ArchanMatrices are ubiquitous mathematical structures that arise throughout computer science. We study fast algorithms for several central problems involving matrices, including eigenvalue approximation, spectral approximation, and low-rank approximation. In particular, we focus on sublinear time or sublinear query algorithms that can scale to very large matrices. We focus on developing algorithms with theoretical bounds and demonstrate the applicability of these algorithms on real-world datasets. We first present a simple randomized algorithm for approximating textit{all} the eigenvalues of a bounded-entry matrix to a small additive error by querying a small random submatrix of the input matrix. Next, we give the first sublinear query deterministic algorithms that can approximate any symmetric matrix $mathbf{A}$ in the spectral norm -- i.e., that output $tilde{mathbf{A}}$ where $|mathbf{A}-tilde{mathbf{A}}|_2$ is bounded. Using this result, we give the first deterministic algorithm that can approximate all the singular values of any symmetric matrix to small additive error in faster than matrix multiplication time. We further extend the above results by improving the query complexity in the case when $mathbf{A}$ is positive semidefinite (PSD) with entries in ${-1,0,1}$. Then we empirically analyze eigenvalue approximation of symmetric matrices using various matrix-vector query algorithms. We explore the trade-offs between adaptivity and query complexity of such algorithms. This study complements our work on algorithms that read a sublinear number of entries of the input matrix. Finally, we present a generalization of the Nyström method for low-rank approximations of general symmetric matrices. We conduct a comparative study of this generalized Nyström method and other sublinear algorithms for computing low-rank approximations of similarity matrices arising in natural language processing (NLP) tasks.Publication Towards Effective Modeling of Long-range Context(2024-05) Sun, SimengAt the core of recent advancements in natural language processing are language models, which are trained to predict the next token given the preceding context. Recent developments in deep learning has led to the efficient scaling of context window in Transformer-based language models. Despite the progress, these models still exhibit severe limitations when tackling long-context tasks, such as book-level summarization and long-document question answering. While the context window size has been continuously increasing, there is a lack of understanding on how these models utilize long-range context, or context that spans at least several thousand tokens. As such, we first provide an analysis of long-range context modeling with both perplexity and segment-level task evaluations. Our results show that perplexity, the most commonly used intrinsic metric for language model evaluation, may obscure the evaluation of long-range context modeling. In contrast, segment-level evaluation, which involves computing the probability of a sequence of tokens rather than a single token as done in perplexity, proves to be a more suitable method for evaluating long-range context modeling. Based on this finding, we enhance the segment-level evaluation by proposing a challenge dataset ChapterBreak, and demonstrate that SuffixLM, a model trained with segment-level signals, outperforms the standard token-level language model in this task. The limited context modeling capability prompts us to investigate new ways to improve recent large language models. To this end, we first develop a prompting framework, PEARL, by leveraging large instruction fine-tuned language models to decompose complex reasoning into executable plans. We demonstrate the efficacy of PEARL on a subset of the long-document QA dataset, where the correct answer depends on the long-range context instead of a short excerpt. Our second approach builds on the benefits of modeling context at the segment level. Concretely, we propose a new training method, SuffixRL, by fine-tuning a token-level language model directly using segment-level signals. We show that training models with SuffixRL leads to more natural and coherent continuations in an open-ended generation setting. Finally, we conclude this thesis by identifying seven concrete topics that hold promise for future exploration. We hope this thesis can spur more principled research in long-context modeling.Publication Extracting Token-level Semantic Matching in Text-pair Classification Tasks(2024-05) Kim, YoungwooThis dissertation presents approaches to obtain interpretability and extract token-level semantics from transformer-based text-pair classification models. We focus on both token-level task-solving and model explanation for natural language inference (NLI) and information retrieval tasks. We hypothesize that if these models can successfully address text-pair classification problems, they must inherently possess the capacity to solve corresponding token-level problems to a certain degree. However, the objective is not only to transform the text-pair classification solutions into token-level inferences that are prerequisite for these tasks but also to explain the decision-making processes and behavior of these models. The first half of the dissertation comprises two parts focused on deriving interpretability from neural NLI models. The initial part proposes a sequence labeling task called classification role labeling (CRL) to represent token-level semantic understanding in NLI. The goal is to label each token in the text-pair based on their semantic alignment and whether they contribute to contradictions. We show that such sequence labeling models can be trained by weak-supervision from a NLI classification model. The subsequent part studies the use of CRL for explaining contradictory claims from biomedical articles, demonstrating the effectiveness of our novel model, PAT, on the Cond-NLI dataset. The second half of the dissertation spans two parts targeting the ad-hoc retrieval task, specifically on explaining the mechanism behind query-document relevance scoring functions. One part investigates local alignment rationales for explaining query-document relevance classification from a black-box model, proposing perturbation-based metrics to evaluate alignment rationale quality. In the other part, we provide global explanations for neural ranking models, by representing their semantic matching behavior as ``relevance thesaurus'' containing semantically related query-term and document-term pairs. This thesaurus can reveal corpus-specific features and biases, supporting the utility of our explanation method. Overall, this four-part dissertation introduces novel approaches to complement interpretability in neural text-pair classification models, extracting token-level semantics and alignment rationales without the need for additional human annotations, while also providing insights into the models' decision-making processes.Publication A Descriptive Approach to the Class NP(1997-02) Medina-Peralta, Jose AntonioDescriptive coraplexity is the study of the expressive power of logical languages. There exists a close relationship between the expressive power of a logical language and the computational complexity of the properties captured by such a language. R. Fagin proved that every sentence in second-order existential logic expresses a problem that can be decided by a non-deterministic polynomial time Turing machine: SOƎ = NP [9]. With this fact as our starting point, we study the sentences that express properties that are complete for the class NP via first-order projections (fops). This type of reductions arises naturally in descriptive complexity, and it is known that all problems complete for NP via fops are FO-isomorphic [1]. Our study of SOƎ sentences includes a normal form for sentences that describe NP-complete properties, the development of syntactic tools for proving problems NP-complete via fops, the definition of syntactic families of problems that have a similar syntactic structure, the study of the approximability of the problems in the syntactic families that we define, and a descriptive version of the POP theorem. This dissertation is organized in seven chapters. In Chapter 1, we present an overview of our research area and results. In Chapter 2, we review the concepts and definitions of descriptive complexity. In Chapter 3 we describe a normal form for SOB sentences that define a NP-complete problems via fops. In Chapter 4 we present syntactic tools to prove problems NP-complete via fops and use them to prove that a large number of the known NP-complete problems remain complete via fops. Among these tools, we define families of problems with similar syntactic structure. In Chapter 5, we study the approximation properties of some of the families defined in Chapter 4. In Chapter 6, we prove a descriptive version of the PCP theorem. This descriptive version implies both the PCP theorem in its computational version and Fagin's theorem. We close this work with Chapter 7 presenting our conclusions and suggestions for future research motivated by this dissertation.Publication Improved Network Consistency and Connectivity in Mobile and Sensor Systems(2009-09) Banerjee, NilanjanEdge networks such as sensor, mobile, and disruption tolerant networks suffer from topological uncertainty and disconnections due to myriad of factors including limited battery capacity on client devices and mobility. Hence, providing reliable, always-on consistency for network applications in such mobile and sensor systems is non-trivial and challenging. However, the problem is of paramount importance given the proliferation of mobile phones, PDAs, laptops, and music players. This thesis identifies two fundamental deterrents to addressing the above problem. First, limited energy on client mobile and sensor devices makes high levels of consistency and availability impossible. Second, unreliable support from the network infrastructure, such as coverage holes in WiFi degrades network performance. We address these two issues in this dissertation through client and infrastructure end modifications. The first part of this thesis proposes a novel energy management architecture called Hierarchical Power Management (HPM). HPM combines platforms with diverse energy needs and capabilities into a single integrated system to provide high levels of consistency and availability at minimal energy consumption. We present two systems Triage and Turducken which are instantiations of HPM for sensor net microservers and laptops respectively. The second part of the thesis proposes and analyzes the use of additional infrastructure in the form of relays, mesh nodes, and base stations to enhance sparse and dense mobile networks. We present the design, implementation, and deployment of Throwboxes a relay system to enhance sparse mobile networks and an associated system for enhancing WiFi based mobile networks.Publication Online Management of Resilient and Power Efficient Multicore Processors(2013-09) Rodrigues, RanceThe semiconductor industry has been driven by Moore's law for almost half a century. Miniaturization of device size has allowed more transistors to be packed into a smaller area while the improved transistor performance has resulted in a significant increase in frequency. Increased density of devices and rising frequency led, unfortunately, to a power density problem which became an obstacle to further integration. The processor industry responded to this problem by lowering processor frequency and integrating multiple processor cores on a die, choosing to focus on Thread Level Parallelism (TLP) for performance instead of traditional Instruction Level Parallelism (ILP). While continued scaling of devices have provided unprecedented integration, it has also unfortunately led to a few serious problems: The first problem is that of increasing rates of system failures due to soft errors and aging defects. Soft errors are caused by ionizing radiations that originate from radioactive contaminants or secondary release of charged particles from cosmic neutrons. Ionizing radiations may charge/discharge a storage node causing bit flips which may result in a system failure. In this dissertation, we propose solutions for online detection of such errors in microprocessors. A small and functionally limited core called the Sentry Core (SC) is added to the multicore. It monitors operation of the functional cores in the multicore and whenever deemed necessary, it opportunistically initiates Dual Modular redundancy (DMR) to test the operation of the cores in the multicore. This scheme thus allows detection of potential core failure and comes at a small hardware overhead. In addition to detection of soft errors, this solution is also capable of detecting errors introduced by device aging that results in failure of operation. The solution is further extended to verify cache coherence transactions. A second problem we address in this dissertation relate to power concerns. While the multicore solution addresses the power density problem, overall power dissipation is still limited by packaging and cooling technologies. This limits the number of cores that can be integrated for a given package specification. One way to improve performance within this constraint is to reduce power dissipation of individual cores without sacrificing system performance. There have been prior solutions to achieve this objective that involve Dynamic Voltage and Frequency Scaling (DVFS) and the use of sleep states. DVFS and sleep states take advantage of coarse grain variation in demand for computation. In this dissertation, we propose techniques to maximize performance-per-power of multicores at a fine grained time scale. We propose multiple alternative architectures to attain this goal. One of such architectures we explore is Asymmetric Multicore Processors (AMPs). AMPs have been shown to outperform the symmetric ones in terms of performance and Performance-per-Watt for a fixed resource and power budget. However, effectiveness of these architectures depends on accurate thread-to-core scheduling. To address this problem, we propose online thread scheduling solutions responding to changing computational requirements of the threads. Another solution we consider is for Symmetric Multicore processors (SMPs). Here we target sharing of the large and underutilized resources between pairs of cores. While such architectures have been explored in the past, the evaluations were incomplete. Due to sharing, sometimes the shared resource is a bottleneck resulting in significant performance loss. To mitigate such loss, we propose the Dynamic Voltage and Frequency Boosting (DVFB) of the shared resources. This solution is found to significantly mitigate performance loss in times of contention. We also explore in this dissertation, performance-per-Watt improvement of individual cores in a multicore. This is based on dynamic reconfiguration of individual cores to run them alternately in out-of-order (OOO) and in-order (InO) modes adapting dynamically to workload characteristics. This solution is found to significantly improve power efficiency without compromising overall performance. Thus, in this dissertation we propose solutions for several important problems to facilitate continued scaling of processors. Specifically, we address challenges in the area of reliability of computation and propose low power design solutions to address power constraints.Publication Reconfigurable Technologies for Next Generation Internet and Cluster Computing(2013-09) Unnikrishnan, Deepak C.Modern web applications are marked by distinct networking and computing characteristics. As applications evolve, they continue to operate over a large monolithic framework of networking and computing equipment built from general-purpose microprocessors and Application Specific Integrated Circuits (ASICs) that offers few architectural choices. This dissertation presents techniques to diversify the next-generation Internet infrastructure by integrating Field-programmable Gate Arrays (FPGAs), a class of reconfigurable integrated circuits, with general-purpose microprocessor-based techniques. Specifically, our solutions are demonstrated in the context of two applications - network virtualization and distributed cluster computing. Network virtualization enables the physical network infrastructure to be shared among several logical networks to run diverse protocols and differentiated services. The design of a good network virtualization platform is challenging because the physical networking substrate must scale to support several isolated virtual networks with high packet forwarding rates and offer sufficient flexibility to customize networking features. The first major contribution of this dissertation is a novel high performance heterogeneous network virtualization system that integrates FPGAs and general-purpose CPUs. Salient features of this architecture include the ability to scale the number of virtual networks in an FPGA using existing software-based network virtualization techniques, the ability to map virtual networks to a combination of hardware and software resources on demand, and the ability to use off-chip memory resources to scale virtual router features. Partial-reconfiguration has been exploited to dynamically customize virtual networking parameters. An open software framework to describe virtual networking features using a hardware-agnostic language has been developed. Evaluation of our system using a NetFPGA card demonstrates one to two orders of improved throughput over state-of-the-art network virtualization techniques. The demand for greater computing capacity grows as web applications scale. In state-of-the-art systems, an application is scaled by parallelizing the computation on a pool of commodity hardware machines using distributed computing frameworks. Although this technique is useful, it is inefficient because the sequential nature of execution in general-purpose processors does not suit all workloads equally well. Iterative algorithms form a pervasive class of web and data mining algorithms that are poorly executed on general purpose processors due to the presence of strict synchronization barriers in distributed cluster frameworks. This dissertation presents Maestro, a heterogeneous distributed computing framework that demonstrates how FPGAs can break down such synchronization barriers using asynchronous accumulative updates. These updates allow for the accumulation of intermediate results for numerous data points without the need for iteration-based barriers. The benefits of a heterogeneous cluster are illustrated by executing a general-class of iterative algorithms on a cluster of commodity CPUs and FPGAs. Computation is dynamically prioritized to accelerate algorithm convergence. We implement a general-class of three iterative algorithms on a cluster of four FPGAs. A speedup of 7× is achieved over an implementation of asynchronous accumulative updates on a general-purpose CPU. The system offers 154× speedup versus a standard Hadoop-based CPU-workstation cluster. Improved performance is achieved by clusters of FPGAs.Publication Semantically Grounded Learning from Unstructured Demonstrations(2013-09) Niekum, Scott D.Robots exhibit flexible behavior largely in proportion to their degree of semantic knowledge about the world. Such knowledge is often meticulously hand-coded for a narrow class of tasks, limiting the scope of possible robot competencies. Thus, the primary limiting factor of robot capabilities is often not the physical attributes of the robot, but the limited time and skill of expert programmers. One way to deal with the vast number of situations and environments that robots face outside the laboratory is to provide users with simple methods for programming robots that do not require the skill of an expert. For this reason, learning from demonstration (LfD) has become a popular alternative to traditional robot programming methods, aiming to provide a natural mechanism for quickly teaching robots. By simply showing a robot how to perform a task, users can easily demonstrate new tasks as needed, without any special knowledge about the robot. Unfortunately, LfD often yields little semantic knowledge about the world, and thus lacks robust generalization capabilities, especially for complex, multi-step tasks. To address this shortcoming of LfD, we present a series of algorithms that draw from recent advances in Bayesian nonparametric statistics and control theory to automatically detect and leverage repeated structure at multiple levels of abstraction in demonstration data. The discovery of repeated structure provides critical insights into task invariants, features of importance, high-level task structure, and appropriate skills for the task. This culminates in the discovery of semantically meaningful skills that are flexible and reusable, providing robust generalization and transfer in complex, multi-step robotic tasks. These algorithms are tested and evaluated using a PR2 mobile manipulator, showing success on several complex real-world tasks, such as furniture assembly.Publication Optimizing Linear Queries Under Differential Privacy(2013-09) Li, ChaoPrivate data analysis on statistical data has been addressed by many recent literatures. The goal of such analysis is to measure statistical properties of a database without revealing information of individuals who participate in the database. Differential privacy is a rigorous privacy definition that protects individual information using output perturbation: a differentially private algorithm produces statistically indistinguishable outputs no matter whether the database contains a tuple corresponding to an individual or not. It is straightforward to construct differentially private algorithms for many common tasks and there are published algorithms to support various tasks under differential privacy. However methods to design error-optimal algorithms for most non-trivial tasks are still unknown. In particular, we are interested in error-optimal algorithms for sets of linear queries. A linear query is a sum of counts of tuples that satisfy a certain condition, which covers the scope of many aggregation tasks including count, sum and histogram. We present the matrix mechanism, a novel mechanism for answering sets of linear queries under differential privacy. The matrix mechanism makes a clear distinction between a set of queries submitted by users, called the query workload, and an alternative set of queries to be answered under differential privacy, called the query strategy. The answer to the query workload can then be computed using the answer to the query strategy. Given a query workload, the query strategy determines the distribution of the output noise and the power of the matrix mechanism comes from adaptively choosing a query strategy that minimizes the output noise. Our analyses also provide a theoretical measure to the quality of different strategies for a given workload. This measure is then used in accurate and approximate formulations to the optimization problem that outputs the error-optimal strategy. We present a lower bound of error to answer each workload under the matrix mechanism. The bound reveals that the hardness of a query workload is related to the spectral properties of the workload when it is represented in matrix form. In addition, we design an approximate algorithm, which generates strategies generated by our a out perform state-of-art mechanisms over (epsilon, delta)-differential privacy. Those strategies lead to more accurate data analysis while preserving a rigorous privacy guarantee. Moreover, we also combine the matrix mechanism with a novel data-dependent algorithm, which achieves differential privacy by adding noise that is adapted to the input data and to the given query workload.Publication Exploring Privacy and Personalization in Information Retrieval Applications(2013-09) Feild, Henry A.A growing number of information retrieval applications rely on search behavior aggregated over many users. If aggregated data such as search query reformulations is not handled properly, it can allow users to be identified and their privacy compromised. Besides leveraging aggregate data, it is also common for applications to make use of user-specific behavior in order to provide a personalized experience for users. Unlike aggregate data, privacy is not an issue in individual personalization since users are the only consumers of their own data. The goal of this work is to explore the effects of personalization and privacy preservation methods on three information retrieval applications, namely search task identification, task-aware query recommendation, and searcher frustration detection. We pursue this goal by first introducing a novel framework called CrowdLogging for logging and aggregating data privately over a distributed set of users. We then describe several privacy mechanisms for sanitizing global data, including one novel mechanism based on differential privacy. We present a template for describing how local user data and global aggregate data are collected, processed, and used within an application, and apply this template to our three applications. We find that sanitizing feature vectors aggregated across users has a low impact on performance for classification applications (search task identification and searcher frustration detection). However, sanitizing free-text query reformulations is extremely detrimental to performance for the query recommendation application we consider. Personalization is useful to some degree in all the applications we explore when integrated with global information, achieving gains for search task identification, task-aware query recommendation, and searcher frustration detection. Finally we introduce an open source system called CrowdLogger that implements the CrowdLogging framework and also serves as a platform for conducting in-situ user studies of search behavior, prototyping and evaluating information retrieval applications, and collecting labeled data.Publication The Security and Privacy Implications of Energy-Proportional Computing(2013-09) Clark, Shane S.The parallel trends of greater energy-efficiency and more aggressive power management are yielding computers that inch closer to energy-proportional computing with every generation. Energy-proportional computing, in which power consumption scales closely with workload, has unintended side effects for security and privacy. Saving energy is an unqualified boon for computer operators, but it is becoming easier to identify computing activities by observing power consumption because an energy-proportional computer reveals more about its workload. This thesis demonstrates the potential for system-level power analysis---the inference of a computers internal states based on power observation at the "plug." It also examines which hardware components and software workloads have the greatest impact on information leakage. This thesis identifies the potential for privacy violations by demonstrating that a malicious party could identify which webpage from a given corpus a user is viewing with greater than 99% accuracy. It also identifies constructive applications for power analysis, evaluating its use as an anomaly detection mechanism for embedded devices with greater than 94% accuracy for each device tested. Finally, this thesis includes modeling work that correlates AC and DC power consumption to pinpoint which components contribute most to information leakage and analyzes software workloads to identify which classes of work lead to the most information leakage. Understanding the security and privacy risks and opportunities that come with energy-proportional computing will allow future systems to either apply system-level power analysis fruitfully or thwart its malicious application.Publication Query-Time Optimization Techniques for Structured Queries in Information Retrieval(2013-09) Cartright, Marc-AllenThe use of information retrieval (IR) systems is evolving towards larger, more complicated queries. Both the IR industrial and research communities have generated significant evidence indicating that in order to continue improving retrieval effectiveness, increases in retrieval model complexity may be unavoidable. From an operational perspective, this translates into an increasing computational cost to generate the final ranked list in response to a query. Therefore we encounter an increasing tension in the trade-off between retrieval effectiveness (quality of result list) and efficiency (the speed at which the list is generated). This tension creates a strong need for optimization techniques to improve the efficiency of ranking with respect to these more complex retrieval models This thesis presents three new optimization techniques designed to deal with different aspects of structured queries. The first technique involves manipulation of interpolated subqueries, a common structure found across a large number of retrieval models today. We then develop an alternative scoring formulation to make retrieval models more responsive to dynamic pruning techniques. The last technique is delayed execution, which focuses on the class of queries that utilize term dependencies and term conjunction operations. In each case, we empirically show that these optimizations can significantly improve query processing efficiency without negatively impacting retrieval effectiveness. Additionally, we implement these optimizations in the context of a new retrieval system known as Julien. As opposed to implementing these techniques as one-off solutions hard-wired to specific retrieval models, we treat each technique as a ``behavioral'' extension to the original system. This allows us to flexibly stack the modifications to use the optimizations in conjunction, increasing efficiency even further. By focusing on the behaviors of the objects involved in the retrieval process instead of on the details of the retrieval algorithm itself, we can recast these techniques to be applied only when the conditions are appropriate. Finally, the modular design of these components illustrates a system design that allows improvements to be implemented without disturbing the existing retrieval infrastructure.Publication High-Performance Processing of Continuous Uncertain Data(2013-05) Tran, Thanh Thi LacUncertain data has arisen in a growing number of applications such as sensor networks, RFID systems, weather radar networks, and digital sky surveys. The fact that the raw data in these applications is often incomplete, imprecise and even misleading has two implications: (i) the raw data is not suitable for direct querying, (ii) feeding the uncertain data into existing systems produces results of unknown quality. This thesis presents a system for uncertain data processing that has two key functionalities, (i) capturing and transforming raw noisy data to rich queriable tuples that carry attributes needed for query processing with quantified uncertainty, and (ii) performing query processing on such tuples, which captures changes of uncertainty as data goes through various query operators. The proposed system considers data naturally captured by continuous distributions, which is prevalent in sensing and scientific applications. The first part of the thesis addresses data capture and transformation by proposing a probabilistic modeling and inference approach. Since this task is application-specific and requires domain knowledge, this approach is demonstrated for RFID data from mobile readers. More specifically, the proposed solution involves an inference and cleaning substrate to transform raw RFID data streams to object location tuple streams where locations are inferred from raw noisy data and their uncertain values are captured by probability distributions. The second, also the main part, of this thesis examines query processing for uncertain data modeled by continuous random variables. The proposed system includes new data models and algorithms for relational processing, with a focus on aggregation and conditioning operations. For operations of high complexity, optimizations including approximations with guaranteed error bounds are considered. Then complex queries involving a mix of operations are addressed by query planning, which given a query, finds an efficient plan that meets user-defined accuracy requirements. Besides relational processing, this thesis also provides the support for user-defined functions (UDFs) on uncertain data, which aims to compute the output distribution given uncertain input and a black-box UDF. The proposed solution employs a learning-based approach using Gaussian processes to compute approximate output with error bounds, and a suite of optimizations for high performance in online settings such as data stream processing and interactive data analysis. The techniques proposed in this thesis are thoroughly evaluated using both synthetic data with controlled properties and various real-world datasets from the domains of severe weather monitoring, object tracking using RFID readers, and computational astrophysics. The experimental results show that these techniques can yield high accuracy, meet stream speeds, and outperform existing techniques such as Monte Carlo sampling for many important workloads .Publication Elastic Resource Management in Cloud Computing Platforms(2013-05) Sharma, UpendraLarge scale enterprise applications are known to observe dynamic workload; provisioning correct capacity for these applications remains an important and challenging problem. Predicting high variability fluctuations in workload or the peak workload is difficult; erroneous predictions often lead to under-utilized systems or in some situations cause temporarily outage of an otherwise well provisioned web-site. Consequently, rather than provisioning server capacity to handle infrequent peak workloads, an alternate approach of dynamically provisioning capacity on-the-fly in response to workload fluctuations has become popular. Cloud platforms are particularly suited for such applications due to their ability to provision capacity when needed and charge for usage on pay-per-use basis. Cloud environments enable elastic provisioning by providing a variety of hardware configurations as well as mechanisms to add or remove server capacity. The first part of this thesis presents Kingfisher, a cost-aware system that provides a generalized provisioning framework for supporting elasticity in the cloud by (i) leveraging multiple mechanisms to reduce the time to transition to new configurations, and (ii) optimizing the selection of a virtual server configuration that minimize cost. Majority of these enterprise applications, deployed as web applications, are distributed or replicated with a multi-tier architecture. SLAs for such applications are often expressed as a high percentile of a performance metric, for e.g. 99 percentile of end to end response time is less than 1 sec. In the second part of this thesis I present a model driven technique which provisions a multi-tier application for such an SLA and is targeted for cloud platforms. Enterprises critically depend on these applications and often own large IT infrastructure to support the regular operation of these applications. However, provisioning for a peak load or for high percentile of response time could be prohibitively expensive. Thus there is a need of hybrid cloud model, where the enterprise uses its own private resources for the majority of its computing, but then "bursts" into the cloud when local resources are insufficient. I discuss a new system, namely Seagull, which performs dynamic provisioning over a hybrid cloud model by enabling cloud bursting. Finally, I describe a methodology to model the configuration patterns (i.e deployment topologies) of different control plane services of a cloud management system itself. I present a generic methodology, based on empirical profiling, which provides initial deployment configuration of a control plane service and also a mechanism which iteratively adjusts the configuration to avoid violation of control plane's Service Level Objective (SLO).