Challenges for Dataset Search
Ranked search of datasets has emerged as a need as shared scientific archives grow in size and variety. Our own investigations have shown that IR-style, feature-based relevance scoring can be an effective tool for data discovery in scientific archives. However, maintaining interactive response times as archives scale will be a challenge. We report here on our exploration of performance techniques for Data Near Here, a dataset search service. We present a sample of results evaluating filter-restart techniques in our system, including two variations, adaptive relaxation and contraction. We then outline further directions for research in this domain.