Click on “Download PDF” for the PDF version or on the title for the HTML version.


If you are not an ASABE member or if your employer has not arranged for access to the full-text, Click here for options.

Iris: Tuning the configuration parameters of NoSQL databases for high-throughput digital agricultural processing pipelines

Published by the American Society of Agricultural and Biological Engineers, St. Joseph, Michigan www.asabe.org

Citation:  2019 ASABE Annual International Meeting  1900330.(doi:10.13031/aim.201900330)
Authors:   Ashraf Y Mahgoub, Rakesh Kumar, Somali Chaterji
Keywords:   Agricultural Processing Pipelines, Dynamic workloads, Performance Prediction

Abstract. Precision agriculture (Precision AG) provides accurate farming techniques through advanced monitoring, measuring and timely decisions. Powered by NoSQL datastores, agricultural processing pipelines can now scale to levels beyond what can be achieved by traditional database management systems, such as PostgreSQl. However, tuning NoSQL datastores for high throughput and low latency under precision agriculture workloads are challenging for several reasons. First, NoSQL datasores have many performance-sensitive configuration parameters, Cassandra for example has 50. Second, the aggregate workload in precision AG environments changes overtime due to environmental changes such as flash floods or onset of crop diseases. With changes in the workload, new configuration parameters are needed to sustain optimal performance. In this paper, we introduce our system, Iris, to tune Redis, which is one of the most popular NoSQL datastores. First, we apply machine learning techniques to identify the most impactful performance-sensitive parameters to tune. Second, we use performance prediction models, deep learning and random forest variants, to serve as surrogate models for the NoSQL datastore. This allows for faster testing of new configuration parameters for the new workload compared to slow benchmarking of the actual NoSQL datastore by running it every time there is a new workload. Finally, we show that Iris achieves better performance than the NoSQL default configurations as well as the best-static configuration in both throughput and latency metrics.

(Download PDF)    (Export to EndNotes)