Streaming Dataframes
CRANK

This work is supported by Anaconda Inc and the Data Driven Discovery Initiative from the Moore FoundationThis post is about experimental software. This is not ready for public use. All code examples and API in this post are subject to change without warning.SummaryThis post describes a prototype project to handle continuous data sources of tabular data using Pandas and Streamz.IntroductionSome data never stops. It arrives continuously in a constant, never-ending stream. This happens in financial time series, web server logs, scientific instruments, IoT telemetry, and more. Algorithms to handle this data are slightly different from what you find in libraries like NumPy and Pandas, which assume that they know all of the data up-front. It’s still possible to use NumPy and Pandas, but you need to combine them with some cleverness and keep enough intermediate data around to compute marginal updates when new data comes in.Example: Streaming MeanFor example, imagine that we have a continuou…

matthewrocklin.com
Related Topics: Python IoT