Oman R Users
2023-11-08
github: rsangole
tidyverse
, particularly dplyr
The arrow R package exposes an interface to the Arrow C++ library, enabling access to many of its features in R
Read and write
Data analysis
Yellow and green taxi trip records… pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, passenger counts…
Size : 40 GB on Disk
Dimensions : 1.15 B rows x 24 cols!
Ref: https://parquet.apache.org
Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.
arrow
evaluates lazily by defaultcollect()
{dplyr}
verbs, filter
, select
, mutate
, join
, distinct
, group_by
+ summarize
, and across
to_duckdb()
saves the day for pivoting and window functionsregister_scalar_function
can be used to create UDFsrsangole/oman-rusers-arrow
rahulsangole
Rahul S | Oman R Users | Nov 2023