ReadRdf

This repository is based on https://github.com/duckdb/extension-template, check it out if you want to build and ship your own DuckDB extension.

This extension, ReadRdf, allow you to read RDF files directly into DuckDB. The SERD libray is used for this, meaning the extension can parse Turtle, NTriples, NQuads, and TriG.

Six columns are returned for RDF. Some will be null if the associated values aren't present. Graph (if present), Subject, predicate, object, language_tag (if present), datatype (if present).

Building

Managing dependencies

This project doesn't currently use VCPKG so all discussion of it removed. You don't need that for build 😀

Build steps

To build the extension, first clone this repo. Then in the repo base locally run:

git submodule update --init --recursive

To get the source for DuckDB, Serd and CI-tools. Next run:

make

If you have ninja avilable you can use that for faster builds:

GEN=ninja make

The main binaries that will be built are:

./build/release/duckdb
./build/release/test/unittest
./build/release/extension/read_rdf/read_rdf.duckdb_extension

duckdb is the binary for the duckdb shell with the extension code automatically loaded.
unittest is the test runner of duckdb. Again, the extension is already linked into the binary.
read_rdf.duckdb_extension is the loadable binary as it would be distributed.

Running the extension

To run the extension code, simply start the shell with ./build/release/duckdb.

Now we can use the features from the extension directly in DuckDB. The template contains a single table function read_rdf() that takes a single string arguments (the name of the RDF file) and returns a table:

D select subject, predicate from read_rdf('test/rdf/tests.nt');
┌───────────────────────────────────┬─────────────────────────────────────────────────┐
│              subject              │                    predicate                    │
│              varchar              │                     varchar                     │
├───────────────────────────────────┼─────────────────────────────────────────────────┤
│ http://example.org/person/JohnDoe │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │
│ http://example.org/person/JohnDoe │ http://xmlns.com/foaf/0.1/name                  │
│ http://example.org/person/JohnDoe │ http://xmlns.com/foaf/0.1/age                   │
│ http://example.org/person/JohnDoe │ http://xmlns.com/foaf/0.1/knows                 │
│ jane                              │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │
│ jane                              │ http://xmlns.com/foaf/0.1/name                  │
│ http://example.org/book/123       │ http://purl.org/dc/elements/1.1/title           │
│ http://example.org/book/123       │ http://purl.org/dc/elements/1.1/creator         │
│ http://unicode.org/duck           │ http://example.org/hasEmoji                     │
└───────────────────────────────────┴─────────────────────────────────────────────────┘

Optional Parameters

Strict Parsing

The optional parameter strict_parsing, defaults to true and exposes the underlying strict parsing feature of the serd RDF parsing library. When false it permits malformed URIs. To disable strict parsing, pass strict_parsing = false.

Prefix Expansion

The optional parameter prefix_expansion defaults to false and exposes the underlying serd serd_env_expand_node function to expand CURIE form URIs to fully defined URIs. This is applied to all columns and is ignored when parsing ntriples and nquads.

Running the tests

Test for this extension are SQL tests in ./test/sql. They rely on a samples in the test/rdf directory. These SQL tests can be run using:

make test

Installing the deployed binaries

To install from GitHub actions:

navigate to the actions for this repo
click on the latest successful build (or build for a release)
select the architecture you want from the left hand navigation
open the Run actions/upload artifact step
find the artifact URL for the compiled extension
download, unzip and then install to DudkDB

To install your extension binaries from S3, you will need to do two things. Firstly, DuckDB should be launched with the allow_unsigned_extensions option set to true. How to set this will depend on the client you're using. Some examples:

CLI:

duckdb -unsigned

Python:

con = duckdb.connect(':memory:', config={'allow_unsigned_extensions' : 'true'})

NodeJS:

db = new duckdb.Database(':memory:', {"allow_unsigned_extensions": "true"});

Secondly, you will need to set the repository endpoint in DuckDB to the HTTP url of your bucket + version of the extension you want to install. To do this run the following SQL query in DuckDB:

SET custom_extension_repository='bucket.s3.eu-west-1.amazonaws.com/<your_extension_name>/latest';

Note that the /latest path will allow you to install the latest extension version available for your current version of DuckDB. To specify a specific version, you can pass the version instead.

After running these steps, you can install and load your extension using the regular INSTALL/LOAD commands in DuckDB:

INSTALL read_rdf
LOAD read_rdf

If you'd like to see this listed as a community extension, please file an issue (or comment on an existing issue for the same) and if there's sufficient demand I'll try and make it happen.

Future enhancements

Potential future enhancements are

support file globbing (e.g. a directory of RDF files)
potentially support RDF XML using libxml2 SAX parsing

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.github/workflows		.github/workflows
.vscode		.vscode
docs		docs
duckdb @ d1dc88f		duckdb @ d1dc88f
extension-ci-tools @ 35caa19		extension-ci-tools @ 35caa19
scripts		scripts
serd @ 45663a8		serd @ 45663a8
src		src
test		test
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.editorconfig		.editorconfig
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
extension_config.cmake		extension_config.cmake
vcpkg.json		vcpkg.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ReadRdf

Building

Managing dependencies

Build steps

Running the extension

Optional Parameters

Strict Parsing

Prefix Expansion

Running the tests

Installing the deployed binaries

Future enhancements

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors 2

Languages

License

nonodename/read_rdf

Folders and files

Latest commit

History

Repository files navigation

ReadRdf

Building

Managing dependencies

Build steps

Running the extension

Optional Parameters

Strict Parsing

Prefix Expansion

Running the tests

Installing the deployed binaries

Future enhancements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors 2

Languages

Packages