Package 'mongolite' reference manual

Title:	Fast and Simple 'MongoDB' Client for R
Description:	High-performance MongoDB client based on 'mongo-c-driver' and 'jsonlite'. Includes support for aggregation, indexing, map-reduce, streaming, encryption, enterprise authentication, and GridFS. The online user manual provides an overview of the available methods in the package: <https://jeroen.github.io/mongolite/>.
Authors:	Jeroen Ooms [aut, cre] , MongoDB, Inc [cph] (Bundled mongo-c-driver, see AUTHORS file)
Maintainer:	Jeroen Ooms <[email protected]>
License:	Apache License 2.0
Version:	2.8.2
Built:	2024-12-02 22:21:25 UTC
Source:	https://github.com/jeroen/mongolite

GridFS API

Description

Connect to a GridFS database to search, read, write and delete files.

Usage

gridfs(
  db = "test",
  url = "mongodb://localhost",
  prefix = "fs",
  options = ssl_options()
)
gridfs(
  db = "test",
  url = "mongodb://localhost",
  prefix = "fs",
  options = ssl_options()
)

Arguments

`db`	name of database
`url`	address of the mongodb server in mongo connection string URI format
`prefix`	string to prefix the collection name
`options`	additional connection options such as SSL keys/certs.

Details

We support two interfaces for sending/receiving data from/to GridFS. The fs$read() and fs$write() methods are the most flexible and can send data from/to an R connection, such as a file, socket or url. These methods support a progress counter and can be interrupted if needed. These methods are recommended for reading or writing single files.

The fs$upload() and fs$download() methods on the other hand copy directly between GridFS and your local disk. This API is vectorized so it can transfer many files at once. However individual transfers cannot be interrupted and will block R until completed. This API is only recommended to upload/download a large number of small files.

Modifying files in GridFS is currently unsupported: uploading a file with the same name will generate a new file.

Methods

find(filter = "{}", options = "{}"): Search and list files in the GridFS
download(name, path = '.'): Download one or more files from GridFS to disk. Path may be an existing directory or vector of filenames equal to 'name'.
upload(path, name = basename(path), content_type = NULL, metadata = NULL): Upload one or more files from disk to GridFS. Metadata is an optional JSON string.
read(name, con = NULL, progress = TRUE): Reads a single file from GridFS into a writable R connection. If con is a string it is treated as a filepath; if it is NULL then the output is buffered in memory and returned as a raw vector.
write(con, name, content_type = NULL, metadata = NULL, progress = TRUE): Stream write a single file into GridFS from a readable R connection. If con is a string it is treated as a filepath; it may also be a raw vector containing the data to upload. Metadata is an optional JSON string.
remove(name): Remove a single file from the GridFS
drop(): Removes the entire GridFS collection, including all files

Examples

# Upload a file to GridFS
fs <- gridfs(url = "mongodb+srv://readwrite:[email protected]/test")
input <- file.path(R.home('doc'), "html/logo.jpg")
fs$upload(input, name = 'logo.jpg')

# Download the file back to disk
output <- file.path(tempdir(), 'logo1.jpg')
fs$download('logo.jpg', output)

# Or you can also stream it
con <- file(file.path(tempdir(), 'logo2.jpg'))
fs$read('logo.jpg', con)

# Delete the file on the server
fs$remove('logo.jpg')

files <- c(input, file.path(tempdir(), c('logo1.jpg', 'logo2.jpg')))
hashes <- tools::md5sum(files)
stopifnot(length(unique(hashes)) == 1)

## Not run: 
# Insert Binary Data
fs <- gridfs()
buf <- serialize(nycflights13::flights, NULL)
fs$write(buf, 'flights')
out <- fs$read('flights')
flights <- unserialize(out$data)

tmp <- file.path(tempdir(), 'flights.rds')
fs$download('flights', tmp)
flights2 <- readRDS(tmp)
stopifnot(all.equal(flights, nycflights13::flights))
stopifnot(all.equal(flights2, nycflights13::flights))

# Show what we have
fs$find()
fs$drop()

## End(Not run)
# Upload a file to GridFS
fs <- gridfs(url = "mongodb+srv://readwrite:[email protected]/test")
input <- file.path(R.home('doc'), "html/logo.jpg")
fs$upload(input, name = 'logo.jpg')

# Download the file back to disk
output <- file.path(tempdir(), 'logo1.jpg')
fs$download('logo.jpg', output)

# Or you can also stream it
con <- file(file.path(tempdir(), 'logo2.jpg'))
fs$read('logo.jpg', con)

# Delete the file on the server
fs$remove('logo.jpg')

files <- c(input, file.path(tempdir(), c('logo1.jpg', 'logo2.jpg')))
hashes <- tools::md5sum(files)
stopifnot(length(unique(hashes)) == 1)

## Not run: 
# Insert Binary Data
fs <- gridfs()
buf <- serialize(nycflights13::flights, NULL)
fs$write(buf, 'flights')
out <- fs$read('flights')
flights <- unserialize(out$data)

tmp <- file.path(tempdir(), 'flights.rds')
fs$download('flights', tmp)
flights2 <- readRDS(tmp)
stopifnot(all.equal(flights, nycflights13::flights))
stopifnot(all.equal(flights2, nycflights13::flights))

# Show what we have
fs$find()
fs$drop()

## End(Not run)

MongoDB client

Description

Connect to a MongoDB collection. Returns a mongo connection object with methods listed below. Connections automatically get pooled between collection and gridfs objects to the same database.

Usage

mongo(
  collection = "test",
  db = "test",
  url = "mongodb://localhost",
  verbose = FALSE,
  options = ssl_options()
)
mongo(
  collection = "test",
  db = "test",
  url = "mongodb://localhost",
  verbose = FALSE,
  options = ssl_options()
)

Arguments

`collection`	name of collection
`db`	name of database
`url`	address of the mongodb server in mongo connection string URI format
`verbose`	emit some more output
`options`	additional connection options such as SSL keys/certs.

Details

This manual page is deliberately minimal, see the mongolite user manual for more details and worked examples.

Value

Upon success returns a pointer to a collection on the server. The collection can be interfaced using the methods described below.

Methods

aggregate(pipeline = '{}', handler = NULL, pagesize = 1000, iterate = FALSE): Execute a pipeline using the Mongo aggregation framework. Set iterate = TRUE to return an iterator instead of data frame.
count(query = '{}'): Count the number of records matching a given query. Default counts all records in collection.
disconnect(gc = TRUE): Disconnect collection. The connection gets disconnected once the client is not used by collections in the pool.
distinct(key, query = '{}'): List unique values of a field given a particular query.
drop(): Delete entire collection with all data and metadata.
export(con = stdout(), bson = FALSE, query = '{}', fields = '{}', sort = '{"_id":1}'): Streams all data from collection to a connection in jsonlines format (similar to mongoexport). Alternatively when bson = TRUE it outputs the binary bson format (similar to mongodump).
find(query = '{}', fields = '{"_id" : 0}', sort = '{}', skip = 0, limit = 0, handler = NULL, pagesize = 1000): Retrieve fields from records matching query. Default handler will return all data as a single dataframe.
import(con, bson = FALSE): Stream import data in jsonlines format from a connection, similar to the mongoimport utility. Alternatively when bson = TRUE it assumes the binary bson format (similar to mongorestore).
index(add = NULL, remove = NULL): List, add, or remove indexes from the collection. The add and remove arguments can either be a field name or json object. Returns a dataframe with current indexes.
info(): Returns collection statistics and server info (if available).
insert(data, pagesize = 1000, stop_on_error = TRUE, ...): Insert rows into the collection. Argument 'data' must be a data-frame, named list (for single record) or character vector with json strings (one string for each row). For lists and data frames, arguments in ... get passed to jsonlite::toJSON
iterate(query = '{}', fields = '{"_id":0}', sort = '{}', skip = 0, limit = 0): Runs query and returns iterator to read single records one-by-one.
mapreduce(map, reduce, query = '{}', sort = '{}', limit = 0, out = NULL, scope = NULL): Performs a map reduce query. The map and reduce arguments are strings containing a JavaScript function. Set out to a string to store results in a collection instead of returning.
remove(query = "{}", just_one = FALSE): Remove record(s) matching query from the collection.
rename(name, db = NULL): Change the name or database of a collection. Changing name is cheap, changing database is expensive.
replace(query, update = '{}', upsert = FALSE): Replace matching record(s) with value of the update argument.
run(command = '{"ping": 1}', simplify = TRUE): Run a raw mongodb command on the database. If the command returns data, output is simplified by default, but this can be disabled.
update(query, update = '{"$set":{}}', upsert = FALSE, multiple = FALSE): Modify fields of matching record(s) with value of the update argument.

References

Mongolite User Manual

Jeroen Ooms (2014). The jsonlite Package: A Practical and Consistent Mapping Between JSON Data and R Objects. arXiv:1403.2805. https://arxiv.org/abs/1403.2805

Examples

# Connect to demo server
con <- mongo("mtcars", url =
  "mongodb+srv://readwrite:[email protected]/test")
if(con$count() > 0) con$drop()
con$insert(mtcars)
stopifnot(con$count() == nrow(mtcars))

# Query data
mydata <- con$find()
stopifnot(all.equal(mydata, mtcars))
con$drop()

# Automatically disconnect when connection is removed
rm(con)
gc()

## Not run: 
# dplyr example
library(nycflights13)

# Insert some data
m <- mongo(collection = "nycflights")
m$drop()
m$insert(flights)

# Basic queries
m$count('{"month":1, "day":1}')
jan1 <- m$find('{"month":1, "day":1}')

# Sorting
jan1 <- m$find('{"month":1,"day":1}', sort='{"distance":-1}')
head(jan1)

# Sorting on large data requires index
m$index(add = "distance")
allflights <- m$find(sort='{"distance":-1}')

# Select columns
jan1 <- m$find('{"month":1,"day":1}', fields = '{"_id":0, "distance":1, "carrier":1}')

# List unique values
m$distinct("carrier")
m$distinct("carrier", '{"distance":{"$gt":3000}}')

# Tabulate
m$aggregate('[{"$group":{"_id":"$carrier", "count": {"$sum":1}, "average":{"$avg":"$distance"}}}]')

# Map-reduce (binning)
hist <- m$mapreduce(
  map = "function(){emit(Math.floor(this.distance/100)*100, 1)}",
  reduce = "function(id, counts){return Array.sum(counts)}"
)

# Stream jsonlines into a connection
tmp <- tempfile()
m$export(file(tmp))

# Remove the collection
m$drop()

# Import from jsonlines stream from connection
dmd <- mongo("diamonds")
dmd$import(url("http://jeroen.github.io/data/diamonds.json"))
dmd$count()

# Export
dmd$drop()

## End(Not run)
# Connect to demo server
con <- mongo("mtcars", url =
  "mongodb+srv://readwrite:[email protected]/test")
if(con$count() > 0) con$drop()
con$insert(mtcars)
stopifnot(con$count() == nrow(mtcars))

# Query data
mydata <- con$find()
stopifnot(all.equal(mydata, mtcars))
con$drop()

# Automatically disconnect when connection is removed
rm(con)
gc()

## Not run: 
# dplyr example
library(nycflights13)

# Insert some data
m <- mongo(collection = "nycflights")
m$drop()
m$insert(flights)

# Basic queries
m$count('{"month":1, "day":1}')
jan1 <- m$find('{"month":1, "day":1}')

# Sorting
jan1 <- m$find('{"month":1,"day":1}', sort='{"distance":-1}')
head(jan1)

# Sorting on large data requires index
m$index(add = "distance")
allflights <- m$find(sort='{"distance":-1}')

# Select columns
jan1 <- m$find('{"month":1,"day":1}', fields = '{"_id":0, "distance":1, "carrier":1}')

# List unique values
m$distinct("carrier")
m$distinct("carrier", '{"distance":{"$gt":3000}}')

# Tabulate
m$aggregate('[{"$group":{"_id":"$carrier", "count": {"$sum":1}, "average":{"$avg":"$distance"}}}]')

# Map-reduce (binning)
hist <- m$mapreduce(
  map = "function(){emit(Math.floor(this.distance/100)*100, 1)}",
  reduce = "function(id, counts){return Array.sum(counts)}"
)

# Stream jsonlines into a connection
tmp <- tempfile()
m$export(file(tmp))

# Remove the collection
m$drop()

# Import from jsonlines stream from connection
dmd <- mongo("diamonds")
dmd$import(url("http://jeroen.github.io/data/diamonds.json"))
dmd$count()

# Export
dmd$drop()

## End(Not run)

Mongo Options

Description

Get and set global client options. Calling with NULL parameters returns current values without modifying.

Usage

mongo_options(log_level = NULL, bigint_as_char = NULL, date_as_char = NULL)
mongo_options(log_level = NULL, bigint_as_char = NULL, date_as_char = NULL)

Arguments

`log_level`	integer between 0 and 6 or `NULL` to leave unchanged.
`bigint_as_char`	logical: parse int64 as strings instead of double.
`date_as_char`	logical: parse UTC datetime as strings instead of POSIXct.

Details

Setting log_level to 0 suppresses critical warnings and messages, while 6 is most verbose and displays all debugging information. Possible values for level are:

0: error
1: critical
2: warning
3: message
4: info (default)
5: debug
6: trace

Note that setting it below 2 will suppress important warnings and setting below 1 will suppress critical errors (not recommended). The default is 4.

Get OID date

Description

The initial 4 bytes of a MongoDB OID contain a timestamp value, representing the ObjectId creation, measured in seconds since the Unix epoch.

Usage

oid_to_timestamp(oid)
oid_to_timestamp(oid)

Arguments

oid

string or raw value with document oid

Examples

oid_to_timestamp('5349b4ddd2781d08c09890f3')
oid_to_timestamp('5349b4ddd2781d08c09890f3')

Standalone BSON reader

Description

Reads BSON data from a mongoexport dump file directly into R (if it can fit in memory). This utility does not attempt to convert result into one big single data.frame: the output is always a vector of length equal to total number of documents in the collection.

Usage

read_bson(file, as_json = FALSE, simplify = TRUE, verbose = interactive())
read_bson(file, as_json = FALSE, simplify = TRUE, verbose = interactive())

Arguments

`file`	path or url to a bson file
`as_json`	read data into json strings instead of R lists.
`simplify`	should nested data get simplified into atomic vectors and dataframes where possible? Only used for `as_json = FALSE`.
`verbose`	print some progress output while reading

Details

It is enabled by default to simplify the individual data documents using the same rules as jsonlite. This converts nested lists into atomic vectors and data frames when possible, which makes data easier to work with in R.

An alternative to this function is to import your BSON file into a local mongodb server using the mongo$import() function. This requires little memory and once data is in mongodb you can easily query and modify it.

Examples

diamonds <- read_bson("https://jeroen.github.io/data/diamonds.bson")
length(diamonds)
diamonds <- read_bson("https://jeroen.github.io/data/diamonds.bson")
length(diamonds)

Connection SSL options

Description

Set SSL options to connect to the MongoDB server.

Usage

ssl_options(
  cert = NULL,
  key = cert,
  ca = NULL,
  ca_dir = NULL,
  crl_file = NULL,
  allow_invalid_hostname = NULL,
  weak_cert_validation = NULL
)
ssl_options(
  cert = NULL,
  key = cert,
  ca = NULL,
  ca_dir = NULL,
  crl_file = NULL,
  allow_invalid_hostname = NULL,
  weak_cert_validation = NULL
)

Arguments

`cert`	path to PEM file with client certificate, or a certificate as returned by `openssl::read_cert()`
`key`	path to PEM file with private key from the above certificate, or a key as returned by `openssl::read_key()`. This can be the same PEM file as cert.
`ca`	a certificate authority PEM file
`ca_dir`	directory with CA files
`crl_file`	file with revocations
`allow_invalid_hostname`	do not verify hostname on server certificate
`weak_cert_validation`	disable certificate verification

Package 'mongolite'

Help Index

GridFS API

Description

Usage

Arguments

Details

Methods

Examples

MongoDB client

Description

Usage

Arguments

Details

Value

Methods

References

Examples

Mongo Options

Description

Usage

Arguments

Details

Get OID date

Description

Usage

Arguments

Examples

Standalone BSON reader

Description

Usage

Arguments

Details

Examples

Connection SSL options

Description

Usage

Arguments