Oman R Users
2023-11-08
github: rsangole




tidyverse, particularly dplyr

The arrow R package exposes an interface to the Arrow C++ library, enabling access to many of its features in R
Read and write
Data analysis


Yellow and green taxi trip records… pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, passenger counts…
Size : 40 GB on Disk
Dimensions : 1.15 B rows x 24 cols!

Ref: https://parquet.apache.org
Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.
arrow evaluates lazily by defaultcollect(){dplyr} verbs, filter, select, mutate, join, distinct, group_by + summarize, and acrossto_duckdb() saves the day for pivoting and window functionsregister_scalar_function can be used to create UDFs


rsangole/oman-rusers-arrow
rahulsangole

Rahul S | Oman R Users | Nov 2023