Τετάρτη 19 Δεκεμβρίου 2018

Fast network discovery on sequence data via time-aware hashing

Abstract

Discovering and analyzing networks from non-network data is a task with applications in fields as diverse as neuroscience, genomics, climate science, economics, and more. In domains where networks are discovered on multiple time series, the most common approach is to compute measures of association or similarity between all pairs of time series. The nodes in the resultant network correspond to time series, which are linked by edges weighted according to the association scores of their endpoints. Finally, the fully connected network is thresholded such that only the edges with stronger weights remain and the desired sparsity level is achieved. While this approach is feasible for small datasets, its quadratic (or higher) time complexity does not scale as the individual time series length and the number of compared series increase. Thus, to circumvent the inefficient and wasteful intermediary step of building a fully connected graph before network sparsification, we propose a fast network discovery approach based on probabilistic hashing. Our methods emphasize consecutiveness, or the intuition that time series following similar fluctuations in longer time-consecutive intervals are more similar overall. Evaluation on real data shows that our method can build graphs nearly 15 times faster than baselines (when the baselines do not run out of memory), while achieving accuracy comparable to, or better than, baselines in task-based evaluation. Furthermore, our proposals are general, modular, and may be applied to a variety of sequence similarity search tasks.



https://ift.tt/2Cm2inA

Δεν υπάρχουν σχόλια:

Δημοσίευση σχολίου