This document describes the SWI-Prolog semweb package. The core of this package is an efficient main-memory based RDF store that is tightly connected to Prolog. Additional libraries provide reading and writing RDF/XML and Turtle data, caching loaded RDF documents and persistent storage. This package is the core of a ready-to-run platform for developing Semantic Web applications named ClioPatria, which is distributed seperately. The SWI-Prolog RDF store is amoung the most memory efficient main-memory stores for RDF1http://cliopatria.swi-prolog.org/help/source/doc/home/vnc/prolog/src/ClioPatria/web/help/memusage.txt
Version 3 of the RDF library enhances concurrent use of the library by allowing for lock-free reading and writing using short-held locks. It provides Prolog compatible logical update view on the triple store and isolation using transactions and jargonsnapshots. This version of the library provides near real-time modification and querying of RDF graphs, making it particularly interesting for handling streaming RDF and graph manipulation tasks.
The core of the SWI-Prolog package semweb is an 
efficient main-memory RDF store written in C that is tightly integrated 
with Prolog. It provides a fully logical predicate rdf/3 
to query the RDF store efficiently by using multiple (currently 9) 
indexes. In addition, SWI-Prolog provides libraries for reading and 
writing XML/RDF and Turtle and a library that provides persistency using 
a combination of efficient binary snapshots and journals.
Below, we describe a few usage scenarios that guides the current design of this Prolog-based RDF store.
Application prototyping platform
Bundled with ClioPatria, the store is an efficient platform for prototyping a wide range of semantic web applications. Prolog, connected to the main-memory based store is a productive platform for writing application logic that can be made available through the SPARQL endpoint of ClioPatria, using an application specific API (typically based on JSON or XML) or as an HTML based end-user application. Prolog is more versatile than SPARQL, allows composing of the logic from small building blocks and does not suffer from the Object-relational impedance mismatch.
Data integration
The SWI-Prolog store is optimized for entailment on the
rdfs:subPropertyOf relation. The rdfs:subPropertyOf 
relation is crucial for integrating data from multiple sources while 
preserving the original richness of the sources because integration can 
be achieved by defining the native properties as sub-properties of 
properties from a unifying schema such as Dublin Core.
Dynamic data
This RDF store is one of the few stores that is primarily based on backward reasoning. The big advantage of backward reasoning is that it can much easier deal with changes to the database because it does not have to take care of propagating the consequences. Backward reasoning reduces storage requirements. The price is more reasoning during querying. In many scenarios the extra reasoning using a main memory will outperform the fetching the precomputed results from external storage.
Prototyping reasoning systems
Reasoning systems, not necessarily limited to entailment reasoning, can be prototyped efficiently on the Prolog based store. This includes `what-if' reasoning, which is supported by snapshot and transaction isolation. These features, together with the concurrent loading capabilities, make the platform well equiped to collect relevant data from large external stores for intensive reasoning. Finally, the TIPC package can be used to create networks of cooperating RDF based agents.
Streaming RDF
Transactions, snapshots, concurrent modifications and the database monitoring facilities (see rdf_monitor/2) make the platform well suited for prototyping systems that deal with streaming RDF data.
Depending on the OS and further application restrictions, the SWI-Prolog RDF stores scales to about 15 million triples on 32-bit hardware. On 64-bit hardware, the scalability is limited by the amount of physical memory, allowing for approximately 4 million triples per gigabyte. The other limiting factor for practical use is the time required to load data and/or restore the database from the persistent file backup. Performance depends highly on hardware, concurrent performance and whether or not the data is spread over multiple (named) graphs that can be loaded in parallel. Restoring over 20 million triples per minute is feasible on medium hardware (Intel i7/2600 running Ubuntu 12.10).
The current `semweb' package provides two sets of interface predicates. The original set is described in section 3.2. The new API is described in section 3.1. The original API was designed when RDF was not yet standardised and did not yet support data types and language indicators. The new API is designed from the RDF 1.1 specification, introducing consistent naming and access to literals using the value space. The new API is currently defined on top of the old API, so both APIs can be mixed in a single application.
The library(semweb/rdf11) provides a new interface to 
the SWI-Prolog RDF database based on the RDF 1.1 specification.
Triples consist of the following three terms:
Alias:Local, 
where Alias and Local are atoms. Each abbreviated IRI is expanded by the 
system to a full IRI.
^^Type A type qualified literal. For 
unknown types, Value is a Prolog string. If type is known, the Prolog 
representations from the table below are used.
Datatype IRI Prolog term xsd:float float xsd:double float xsd:decimal float (1) xsd:integer integer XSD integer sub-types integer xsd:boolean trueorfalsexsd:date date(Y,M,D)xsd:dateTime date_time(Y,M,D,HH,MM,SS)(2,3)xsd:gDay integer xsd:gMonth integer xsd:gMonthDay month_day(M,D)xsd:gYear integer xsd:gYearMonth year_month(Y,M)xsd:time time(HH,MM,SS)(2)
Notes:
 (1) The current implementation of xsd:decimal 
values as floats is formally incorrect. Future versions of SWI-Prolog 
may introduce decimal as a subtype of rational.
(2) SS fields denote the number of seconds. This can either be an integer or a float.
 (3) The date_time structure can have a 7th 
field that denotes the timezone offset in seconds as an integer.
In addition, a ground object value is translated into a properly typed RDF literal using rdf_canonical_literal/2.
There is a fine distinction in how duplicate statements are handled in rdf/[3,4]: backtracking over rdf/3 will never return duplicate triples that appear in multiple graphs. rdf/4 will return such duplicate triples, because their graph term differs.
| S | is the subject term. It is either a blank node or IRI. | 
| P | is the predicate term. It is always an IRI. | 
| O | is the object term. It is 
either a literal, a blank node or IRI (except for trueandfalsethat denote the values of datatype XSD boolean). | 
| G | is the graph term. It is always an IRI. | 
inverse_of and
symmetric. See rdf_set_predicate/2.inverse_of and
symmetric predicate properties. The version rdf_reachable/5 
maximizes the steps considered and returns the number of steps taken.
If both S and O are given, these predicates are semidet. 
The number of steps D is minimal because the implementation 
uses
breath first search.
Constraints on literal values
->), the 
semantics of the goal remains the same. Preferably, constraints are 
placed before the graph pattern as they often help the RDF 
database to exploit its literal indexes. In the example below, the 
database can choose between using the subject and/or predicate hash or 
the ordered literal table.
    { Date >= "2000-01-01"^^xsd:dateTime },
    rdf(S, P, Date)
The following constraints are currently defined:
>(), >=()==()=<()<()The predicates rdf_where/1 
and {}/1 are identical. The
rdf_where/1 variant is provided 
to avoid ambiguity in applications where {}/1 is used for other 
purposes. Note that it is also possible to write rdf11:{...}.
For performance reasons, this does not check for compliance to the syntax defined in http://www.ietf.org/rfc/rfc3987.txtRFC 3987 . This checks whether the term is (1) an atom and (2) not a blank node identifier.
Success of this goal does not imply that the IRI is present in the database (see rdf_iri/1 for that).
A blank node is represented by an atom that starts with
_:.
Success of this goal does not imply that the blank node is present in the database (see rdf_bnode/1 for that).
For backwards compatibility, atoms that are represented with an atom 
that starts with __ are also considered to be a blank node.
An RDF literal term is of the form `String@LanguageTag or Value^^Datatype`.
Success of this goal does not imply that the literal is well-formed or that it is present in the database (see rdf_literal/1 for that).
Success of this goal does not imply that the name is well-formed or that it is present in the database (see rdf_name/1) for that).
Success of this goal does not imply that the object term in well-formed or that it is present in the database (see rdf_object/1) for that).
Since any RDF term can appear in the object position, this is equaivalent to rdf_is_term/1.
Success of this goal does not imply that the predicate term is present in the database (see rdf_predicate/1) for that).
Since only IRIs can appear in the predicate position, this is equivalent to rdf_is_iri/1.
Only blank nodes and IRIs can appear in the subject position.
Success of this goal does not imply that the subject term is present in the database (see rdf_subject/1) for that).
Since blank nodes are represented by atoms that start with `_:` and 
an IRIs are atoms as well, this is equivalent to
atom(Term).
Success of this goal does not imply that the RDF term is present in the database (see rdf_term/1) for that).
Prolog Term Datatype IRI float xsd:double integer xsd:integer string xsd:string trueorfalsexsd:boolean date(Y,M,D)xsd:date date_time(Y,M,D,HH,MM,SS)xsd:dateTime date_time(Y,M,D,HH,MM,SS,TZ)xsd:dateTime month_day(M,D)xsd:gMonthDay year_month(Y,M)xsd:gYearMonth time(HH,MM,SS)xsd:time 
For example:
?- rdf_canonical_literal(42, X). X = 42^^'http://www.w3.org/2001/XMLSchema#integer'.
^^Type
Note that this ordering is a complete ordering of RDF terms that is consistent with the partial ordering defined by SPARQL.
| Diff | is one of <,=or> | 
If a type is provided using Value^^Type 
syntax, additional conversions are performed. All types accept either an 
atom or Prolog string holding a valid RDF lexical value for the type and 
xsd:float and xsd:double accept a Prolog integer.
_:. Blank nodes generated by this predicate are of the form
_:genid followed by a unique integer.
The following predicates are utilities to access RDF 1.1 collections. 
A collection is a linked list created from rdf:first and rdf:next 
triples, ending in rdf:nil.
rdf:first and rdf:rest 
property and the list ends in rdf:nil.
If RDFTerm is unbound, RDFTerm is bound to each maximal 
RDF list. An RDF list is maximal if there is no triple rdf(_, rdf:rest, RDFList).
Implementation of the conventional human interpretation of RDF 1.1 containers.
RDF containers are open enumeration structures as opposed to RDF collections or RDF lists which are closed enumeration structures. The same resource may appear in a container more than once. A container may be contained in itself.
rdf:Alt with 
first member
Default and remaining members Others.
Notice that this construct adds no machine-processable semantics but is conventionally used to indicate to a human reader that the numerical ordering of the container membership properties of Container is intended to only be relevant in distinguishing between the first and all non-first members.
Default denotes the default option to take when choosing one of the alternatives container in Container. Others denotes the non-default options that can be chosen from.
Notice that this construct adds no machine-processable semantics but is conventionally used to indicate to a human reader that the numerical ordering of the container membership properties of Container is intended to not be significant.
Notice that this construct adds no machine-processable semantics but is conventionally used to indicate to a human reader that the numerical ordering of the container membership properties of Container is intended to be significant.
Success of this goal does not imply that Property is present in the database.
rdf(Container, P, Elem) is true and P is a 
container membership property.rdf(Container, P, Elem) is true and P is the N-th 
(0-based) container membership property.
The central module of the RDF infrastructure is library(semweb/rdf_db). 
It provides storage and indexed querying of RDF triples. RDF data is 
stored as quintuples. The first three elements denote the RDF triple. 
The extra Graph and Line elements provide information 
about the origin of the triple.
The actual storage is provided by the foreign language (C) 
module. Using a dedicated C-based implementation we can reduced memory 
usage and improve indexing capabilities, for example by providing a 
dedicated index to support entailment over rdfs:subPropertyOf. 
Currently the following indexes are provided (S=subject, P=predicate, 
O=object, G=graph):
(rdf(R,_,_);rdf(_,_,R)) normally produces many 
duplicate answers.
library(semweb/litindex) 
provides indexed search on tokens inside literals.
literal(Value) 
if the object is a literal value. If a value of the form 
NameSpaceID:LocalName is provided it is expanded to a ground atom using expand_goal/2. 
This implies you can use this construct in compiled code without paying 
a performance penalty. Literal values take one of the following forms:
rdf:datatype
TypeID. The Value is either the textual 
representation or a natural Prolog representation. See the option 
convert_typed_literal(:Convertor) of the parser. The storage layer 
provides efficient handling of atoms, integers (64-bit) and floats 
(native C-doubles). All other data is represented as a Prolog record.
For literal querying purposes, Object can be of the form
literal(+Query, -Value), where Query is one of the terms 
below. If the Query takes a literal argument and the value has a numeric 
type numerical comparison is performed.
icase(Text). Backward compatibility.
Backtracking never returns duplicate triples. Duplicates can be 
retrieved using rdf/4. The predicate rdf/3 
raises a type-error if called with improper arguments. If rdf/3 
is called with a term literal(_) as Subject or Predicate 
object it fails silently. This allows for graph matching goals like
rdf(S,P,O),rdf(O,P2,O2) to proceed without 
errors.
| Source | is a term Graph:Line. If Source is instatiated, passing an atom is the same as passing Atom:_. | 
rdf(Subject, Predicate, Object) is 
true exploiting the rdfs:subPropertyOf predicate as well as inverse 
predicates declared using rdf_set_predicate/2 
with the
inverse_of property.inverse_of(Pred).symetric(true) or inverse_of(P2) 
properties.
If used with either Subject or Object unbound, it first returns the origin, followed by the reachable nodes in breath-first search-order. The implementation internally looks one solution ahead and succeeds deterministically on the last solution. This predicate never generates the same node twice and is robust against cycles in the transitive relation.
With all arguments instantiated, it succeeds deterministically if a path can be found from Subject to Object. Searching starts at Subject, assuming the branching factor is normally lower. A call with both Subject and Object unbound raises an instantiation error. The following example generates all subclasses of rdfs:Resource:
?- rdf_reachable(X, rdfs:subClassOf, rdfs:'Resource'). X = 'http://www.w3.org/2000/01/rdf-schema#Resource' ; X = 'http://www.w3.org/2000/01/rdf-schema#Class' ; X = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#Property' ; ...
infinite to impose no distance-limit.
The predicates below enumerate the basic objects of the RDF store. Most of these predicates also enumerate objects that are not associated to any currently visible triple. Objects are retained as long as they are visible in active queries or snapshots. After that, some are reclaimed by the RDF garbage collector, while others are never reclaimed.
This predicate is primarily intended as a way to process all resources without processing resources twice. The user must be aware that some of the returned resources may not appear in any visible triple.
Note that resources that have rdf:type rdf:Property 
are not automatically included in the result-set of this predicate, 
while all resources that appear as the second argument of a 
triple are included.
The predicates below modify the RDF store directly. In addition, data may be loaded using rdf_load/2 or by restoring a persistent database using rdf_attach_db/2. Modifications follow the Prolog logical update view semantics, which implies that modifications remain invisible to already running queries. Further isolation can be achieved using rdf_transaction/3.
user. Subject 
and Predicate are resources. Object is either a 
resource or a term literal(Value). See rdf/3 
for an explanation of Value for typed and language qualified literals. 
All arguments are subject to name-space expansion. Complete duplicates 
(including the same graph and `line' and with a compatible `lifespan') 
are not added to the database.| Graph | is either the name of a graph (an atom) or a term Graph:Line, where Line is an integer that denotes a line number. | 
literal(Value).
The update semantics of the RDF database follows the conventional Prolog logical update view. In addition, the RDF database supports transactions and snapshots.
rdf_transaction(Goal, user, []). See rdf_transaction/3.rdf_transaction(Goal, Id, []). See rdf_transaction/3.
library(semweb/rdf_persistency).
Processed options are:
true, which implies that an anonymous 
snapshot is created at the current state of the store. Modifications due 
to executing Goal are only visible to Goal.
snapshot option. A snapshot created outside a 
transaction exists until it is deleted. Snapshots taken inside a 
transaction can only be used inside this transaction._:. 
For backward compatibility reason, __ is also considered to 
be a blank node.
The RDF library can read and write triples in RDF/XML and a 
proprietary binary format. There is a plugin interface defined to 
support additional formats. The library(semweb/turtle) uses 
this plugin API to support loading Turtle files using rdf_load/2.
rdf_load(FileOrList, []). See rdf_load/2.share (default), 
equivalent blank nodes are shared in the same resource.
library(semweb/turtle) extend the 
set of recognised extensions.
true, changed 
(default) or
not_loaded.
not_modified, cached(File),
last_modified(Stamp) or unknown.
false, do not use or create a cache file.
true (default false), register xmlns 
namespace declarations or Turtle @prefix prefixes using
rdf_register_prefix/3 
if there is no conflict.
true, the message reporting completion is printed using 
level silent. Otherwise the level is informational. 
See also print_message/2.
Other options are forwarded to process_rdf/3. By default, rdf_load/2 only loads RDF/XML from files. It can be extended to load data from other formats and locations using plugins. The full set of plugins relevant to support different formats and locations is below:
:- use_module(library(semweb/turtle)). % Turtle and TRiG :- use_module(library(semweb/rdf_ntriples)). :- use_module(library(semweb/rdf_zlib_plugin)). :- use_module(library(semweb/rdf_http_plugin)). :- use_module(library(http/http_ssl_plugin)).
rdf_save(Out, []). See rdf_save/2 
for details.write_xml_base 
option
true (default false), inline resources when 
encountered for the first time. Normally, only bnodes are handled this 
way.
true (default false), emit subjects sorted 
on the full URI. Useful to make file comparison easier.
false, do not include the xml:base 
declaration that is written normally when using the
base_uri option.
false (default true), never use xml 
attributes to save plain literal attributes, i.e., always used an XML 
element as in <name>Joe</name>.
| Out | Location to save the data. 
This can also be a file-url ( file://path) or a stream 
wrapped in a termstream(Out). | 
Sometimes it is necessary to make more arbitrary selections of 
material to be saved or exchange RDF descriptions over an open network 
link. The predicates in this section provide for this. Character 
encoding issues are derived from the encoding of the Stream, 
providing support for
utf8, iso_latin_1 and ascii.
Save an RDF header, with the XML header, DOCTYPE, ENTITY and opening the rdf:RDF element with appropriate namespace declarations. It uses the primitives from section 3.5 to generate the required namespaces and desired short-name. Options is one of:
rdf and rdfs 
are added to the provided List. If a namespace is not 
declared, the resource is emitted in non-abreviated form.
Loading and saving RDF format is relatively slow. For this reason we 
designed a binary format that is more compact, avoids the complications 
of the RDF parser and avoids repetitive lookup of (URL) identifiers. 
Especially the speed improvement of about 25 times is worth-while when 
loading large databases. These predicates are used for caching by
rdf_load/2 under certain 
conditions as well as for maintaining persistent snapshots of the 
database using
library(semweb/rdf_persistency).
Many RDF stores turned triples into quadruples. This store is no exception, initially using the 4th argument to store the filename from which the triple was loaded. Currently, the 4th argument is the RDF named graph. A named graph maintains some properties, notably to track origin, changes and modified state.
modified(false).
Additional graph properties can be added by defining rules for the multifile predicate property_of_graph/2. Currently, the following extensions are defined:
library(semweb/rdf_persistency)
true if the graph is persistent.
Literal values are ordered and indexed using a skip list The aim of this index is threefold.
library(semweb/litindex).
As string literal matching is most frequently used for searching 
purposes, the match is executed case-insensitive and after removal of 
diacritics. Case matching and diacritics removal is based on Unicode 
character properties and independent from the current locale. Case 
conversion is based on the `simple uppercase mapping' defined by Unicode 
and diacritic removal on the `decomposition type'. The approach is 
lightweight, but somewhat simpleminded for some languages. The tables 
are generated for Unicode characters upto 0x7fff. For more information, 
please check the source-code of the mapping-table generator
unicode_map.pl available in the sources of this package.
Currently the total order of literals is first based on the type of literal using the ordering numeric < string < term Numeric values (integer and float) are ordered by value, integers preceed floats if they represent the same value. strings are sorted alphabetically after case-mapping and diacritic removal as described above. If they match equal, uppercase preceeds lowercase and diacritics are ordered on their unicode value. If they still compare equal literals without any qualifier preceeds literals with a type qualifier which preceeds literals with a language qualifier. Same qualifiers (both type or both language) are sorted alphabetically.
The ordered tree is used for indexed execution of
literal(prefix(Prefix), Literal) as well as literal(like(Like), Literal) 
if Like does not start with a `*'. Note that results of queries 
that use the tree index are returned in alphabetical order.
The predicates below form an experimental interface to provide more 
reasoning inside the kernel of the rdb_db engine. Note that symetric,
inverse_of and transitive are not yet 
supported by the rest of the engine. Alo note that there is no relation 
to defined RDF properties. Properties that have no triples are not 
reported by this predicate, while predicates that are involved in 
triples do not need to be defined as an instance of rdf:Property.
symmetric(true) is the same as inverse_of(Predicate), 
i.e., creating a predicate that is the inverse of itself.
inverse_of([]).
The transitive property is currently not used. The symmetric 
and inverse_of properties are considered by rdf_has/3,4 
and
rdf_reachable/3.
inverse_of(Self).
rdf_subject_branch_factor, but also considering 
triples of `subPropertyOf' this relation. See also rdf_has/3.
rdf_object_branch_factor, but also considering 
triples of `subPropertyOf' this relation. See also rdf_has/3.
Prolog code often contains references to constant resources with a 
known
prefix (also known as XML namespaces). For example,
http://www.w3.org/2000/01/rdf-schema#Class refers to the 
most general notion of an RDFS class. Readability and maintability 
concerns require for abstraction here. The RDF database maintains a 
table of known prefixes. This table can be queried using rdf_current_ns/2 
and can be extended using rdf_register_ns/3. 
The prefix database is used to expand prefix:local terms 
that appear as arguments to calls which are known to accept a resource. 
This expansion is achieved by Prolog preprocessor using expand_goal/2.
rdf_current_prefix(Prefix, Expansion), atom_concat(Expansion, Local, URI),
true, Replace existing namespace alias. Please note that 
replacing a namespace is dangerous as namespaces affect preprocessing. 
Make sure all code that depends on a namespace is compiled after 
changing the registration.
true and Alias is already defined, keep the original 
binding for Prefix and succeed silently.
Without options, an attempt to redefine an alias raises a permission error.
Predefined prefixes are:
true, Replace existing namespace alias. Please note that 
replacing a namespace is dangerous as namespaces affect preprocessing. 
Make sure all code that depends on a namespace is compiled after 
changing the registration.
true and Alias is already defined, keep the original 
binding for Prefix and succeed silently.
Without options, an attempt to redefine an alias raises a permission error.
Predefined prefixes are:
Explicit expansion is achieved using the predicates below. The predicate rdf_equal/2 performs this expansion at compile time, while the other predicates do it at runtime.
Note that this predicate is a meta-predicate on its output argument. This is necessary to get the module context while the first argument may be of the form (:)/2. The above mode description is correct, but should be interpreted as (?,?).
Terms of the form Prefix:Local that appear in TermIn for which Prefix is not defined are not replaced. Unlike rdf_global_id/2 and rdf_global_object/2, no error is raised.
Namespace handling for custom predicates
If we implement a new predicate based on one of the predicates of the semweb libraries that expands namespaces, namespace expansion is not automatically available to it. Consider the following code computing the number of distinct objects for a certain property on a certain object.
cardinality(S, P, C) :-
      (   setof(O, rdf_has(S, P, O), Os)
      ->  length(Os, C)
      ;   C = 0
      ).
Now assume we want to write labels/2 that returns the number of distict labels of a resource:
labels(S, C) :-
      cardinality(S, rdfs:label, C).
This code will not work because rdfs:label is not 
expanded at compile time. To make this work, we need to add an rdf_meta/1 
declaration.
:- rdf_meta
      cardinality(r,r,-).
The example below defines the rule concept/1.
:- use_module(library(semweb/rdf_db)).  % for rdf_meta
:- use_module(library(semweb/rdfs)).    % for rdfs_individual_of
:- rdf_meta
        concept(r).
%%      concept(?C) is nondet.
%
%       True if C is a concept.
concept(C) :-
        rdfs_individual_of(C, skos:'Concept').
In addition to expanding calls, rdf_meta/1 also causes expansion of clause heads for predicates that match a declaration. This is typically used write Prolog statements about resources. The following example produces three clauses with expanded (single-atom) arguments:
:- use_module(library(semweb/rdf_db)).
:- rdf_meta
        label_predicate(r).
label_predicate(rdfs:label).
label_predicate(skos:prefLabel).
label_predicate(skos:altLabel).
This section describes the remaining predicates of the
library(semweb/rdf_db) module.
| Location | is a term File:Line. | 
When inside a transaction, Generation is unified to a term TransactionStartGen + InsideTransactionGen. E.g., 4+3 means that the transaction was started at generation 4 of the global database and we have created 3 new generations inside the transaction. Note that this choice of representation allows for comparing generations using Prolog arithmetic. Comparing a generation in one transaction with a generation in another transaction is meaningless.
triples for the interpretation of this value.
icase, substring, word, prefix 
or like. For backward compatibility, exact is 
a synonym for icase.Major*10000 + Minor*100 + Patch.
Storing RDF triples in main memory provides much better performance than using external databases. Unfortunately, although memory is fairly cheap these days, main memory is severely limited when compared to disks. Memory usage breaks down to the following categories. Rough estimates of the memory usage is given for 64-bit systems. 32-bit system use slightly more than half these amounts.
Bucket arrays are resized if necessary. Old triples remain at their original location. This implies that a query may need to scan multiple buckets. The garbage collector may relocate old indexed triples. It does so by copying the old triple. The old triple is later reclaimed by GC. Reindexed triples will be reused, but many reindexed triples may result in a significant memory fragmentation.
The hash parameters can be controlled with rdf_set/1. Applications that are tight on memory and for which the query characteristics are more or less known can optimize performance and memory by fixing the hash-tables. By fixing the hash-tables we can tailor them to the frequent query patterns, we avoid the need for to check multiple hash buckets (see above) and we avoid memory fragmentation due to optimizing triples for resized hashes.
set_hash_parameters :-
      rdf_set(hash(s,   size, 1048576)),
      rdf_set(hash(p,   size, 1024)),
      rdf_set(hash(sp,  size, 2097152)),
      rdf_set(hash(o,   size, 1048576)),
      rdf_set(hash(po,  size, 2097152)),
      rdf_set(hash(spo, size, 2097152)),
      rdf_set(hash(g,   size, 1024)),
      rdf_set(hash(sg,  size, 1048576)),
      rdf_set(hash(pg,  size, 2048)).
s,
p, sp, o, po, spo, g, sg 
or pg. Parameter is one of:
permission_error 
exception.
The RDF store has a garbage collector that runs in a separate thread named =__rdf_GC=. The garbage collector removes the following objects:
rdfs:subPropertyOf relations 
that are related to old queries.
In addition, the garbage collector reindexes triples associated to 
the hash-tables before the table was resized. The most recent resize 
operation leads to the largest number of triples that require 
reindexing, while the oldest resize operation causes the largest 
slowdown. The parameter optimize_threshold controlled by rdf_set/1 
can be used to determine the number of most recent resize operations for 
which triples will not be reindexed. The default is 2.
Normally, the garbage collector does it job in the background at a low priority. The predicate rdf_gc/0 can be used to reclaim all garbage and optimize all indexes.Warming up the database
The RDF store performs many operations lazily or in background threads. For maximum performance, perform the following steps:
warm_indexes :-
    ignore(rdf(s, _, _)),
    ignore(rdf(_, p, _)),
    ignore(rdf(_, _, o)),
    ignore(rdf(s, p, _)),
    ignore(rdf(_, p, o)),
    ignore(rdf(s, p, o)),
    ignore(rdf(_, _, _, g)),
    ignore(rdf(s, _, _, g)),
    ignore(rdf(_, p, _, g)).
Predicates:
Using rdf_gc/0 should only be needed to ensure a fully clean database for analysis purposes such as leak detection.
The duplicates marks are used to reduce the administrative load of avoiding duplicate answers. Normally, the duplicates are marked using a background thread that is started on the first query that produces a substantial amount of duplicates.
The predicate rdf_monitor/2 
allows registrations of call-backs with the RDF store. These call-backs 
are typically used to keep other databases in sync with the RDF store. 
For example,
library(library(semweb/rdf_persistency)) monitors the RDF 
store for maintaining a persistent copy in a set of files and
library(library(semweb/rdf_litindex)) uses added and 
deleted literal values to maintain a fulltext index of literals.
literal(Arg) of the triple's object. This event is 
introduced in version 2.5.0 of this library.
begin(Nesting) or
end(Nesting). Nesting expresses the nesting 
level of transactions, starting at `0' for a toplevel transaction. Id 
is the second argument of rdf_transaction/2. 
The following transaction Ids are pre-defined by the library:
file(Path) or stream(Stream).
file(Path).
Mask is a list of events this monitor is interested in. 
Default (empty list) is to report all events. Otherwise each element is 
of the form +Event or -Event to include or exclude monitoring for 
certain events. The event-names are the functor names of the events 
described above. The special name all refers to all events 
and
assert(load) to assert events originating from rdf_load_db/1. 
As loading triples using rdf_load_db/1 
is very fast, monitoring this at the triple level may seriously harm 
performance.
This predicate is intended to maintain derived data, such as a journal, information for undo, additional indexing in literals, etc. There is no way to remove registered monitors. If this is required one should register a monitor that maintains a dynamic list of subscribers like the XPCE broadcast library. A second subscription of the same hook predicate only re-assignes the mask.
The monitor hooks are called in the order of registration and in the 
same thread that issued the database manipulation. To process all 
changes in one thread they should be send to a thread message queue. For 
all updating events, the monitor is called while the calling thread has 
a write lock on the RDF store. This implies that these events are 
processed strickly synchronous, even if modifications originate from 
multiple threads. In particular, the transaction begin, 
... updates ... end sequence is never interleaved with 
other events. Same for load and parse.
This RDF low-level module has been created after two year 
experimenting with a plain Prolog based module and a brief evaluation of 
a second generation pure Prolog implementation. The aim was to be able 
to handle upto about 5 million triples on standard (notebook) hardware 
and deal efficiently with subPropertyOf which was 
identified as a crucial feature of RDFS to realise fusion of different 
data-sets.
The following issues are identified and not solved in suitable manner.
subPropertyOf of subPropertyOfsubPropertyOf, it is likely to be profitable to 
handle resource identity efficient. The current system has no support 
for it.
The library(rdf_db) module provides several hooks for 
extending its functionality. Database updates can be monitored and acted 
upon through the features described in section 
3.3. The predicate rdf_load/2 
can be hooked to deal with different formats such as rdfturtle, 
different input sources (e.g. http) and different strategies for caching 
results.
The hooks below are used to add new RDF file formats and sources from which to load data to the library. They are used by the modules described below and distributed with the package. Please examine the source-code if you want to add new formats or locations.
library(library(semweb/turtle))library(library(semweb/rdf_zlib_plugin))library(library(semweb/rdf_http_plugin))library(library(http/http_ssl_plugin))library(library(semweb/rdf_http_plugin)) 
to load RDF from HTTPS servers.
library(library(semweb/rdf_persistency))library(library(semweb/rdf_cache))file(+Name),
stream(+Stream) or url(Protocol, URL). If this 
hook succeeds, the RDF will be read from Stream using rdf_load_stream/3. 
Otherwise the default open functionality for file and stream are used.xml.owl. Format is either a built-in 
format (xml or triples) or a format understood 
by the rdf_load_stream/3 
hook.This 
module uses the library(zlib) library to load compressed 
files on the fly. The extension of the file must be .gz. 
The file format is deduced by the extension after stripping the .gz 
extension. E.g. rdf_load('file.rdf.gz').
This module allows for rdf_load('http://...'). 
It exploits the library library(http/http_open.pl). The 
format of the URL is determined from the mime-type returned by the 
server if this is one of
text/rdf+xml, application/x-turtle or
application/turtle. As RDF mime-types are not yet widely 
supported, the plugin uses the extension of the URL if the claimed 
mime-type is not one of the above. In addition, it recognises
text/html and application/xhtml+xml, scanning 
the XML content for embedded RDF.
The library library(semweb/rdf_cache) defines the 
caching strategy for triples sources. When using large RDF sources, 
caching triples greatly speedup loading RDF documents. The cache library 
implements two caching strategies that are controlled by rdf_set_cache_options/1.
Local caching This approach applies to files only. Triples are 
cached in a sub-directory of the directory holding the source. This 
directory is called .cache (_cache on 
Windows). If the cache option create_local_directory is true, 
a cache directory is created if posible.
Global caching This approach applies to all sources, except 
for unnamed streams. Triples are cached in directory defined by the 
cache option global_directory.
When loading an RDF file, the system scans the configured cache files 
unless cache(false) is specified as option to rdf_load/2 
or caching is disabled. If caching is enabled but no cache exists, the 
system will try to create a cache file. First it will try to do this 
locally. On failure it will try to configured global cache.
enabled(Boolean) If true, caching is 
enabled.
local_directory(Name). Plain name of local directory. 
Default .cache (_cache on Windows).
create_local_directory(Bool) If true, try 
to create local cache directories
global_directory(Dir) Writeable directory for storing 
cached parsed files.
create_global_directory(Bool) If true, try 
to create the global cache directory.
read, it returns the name of an existing file. If write 
it returns where a new cache file can be overwritten or created.
The library library(semweb/rdf_litindex.pl) exploits the 
primitives of section 4.5.1 and the 
NLP package to provide indexing on words inside literal constants. It 
also allows for fuzzy matching using stemming and `sounds-like' based on 
the double metaphone algorithm of the NLP package.
sounds(Like, 
Words), stem(Like, Words) or prefix(Prefix, 
Words). On compound expressions, only combinations that provide 
literals are returned. Below is an example after loading the ULAN2Unified 
List of Artist Names from the Getty Foundation. database 
and showing all words that sounds like `rembrandt' and appear together 
in a literal with the word `Rijn'. Finding this result from the 228,710 
literals contained in ULAN requires 0.54 milliseconds (AMD 1600+).
?- rdf_token_expansions(and('Rijn', sounds(rembrandt)), L).
L = [sounds(rembrandt, ['Rambrandt', 'Reimbrant', 'Rembradt',
                        'Rembrand', 'Rembrandt', 'Rembrandtsz',
                        'Rembrant', 'Rembrants', 'Rijmbrand'])]
Here is another example, illustrating handling of diacritics:
?- rdf_token_expansions(case(cafe), L). L = [case(cafe, [cafe, caf\'e])]
rdf_litindex:tokenization(Literal, -Tokens). On failure it 
calls tokenize_atom/2 
from the NLP package and deletes the following: atoms of length 1, 
floats, integers that are out of range and the english words and, an, or, of,
on, in, this and the. 
Deletion first calls the hook rdf_litindex:exclude_from_index(token, 
X). This hook is called as follows:
no_index_token(X) :-
        exclude_from_index(token, X), !.
no_index_token(X) :-
        ...
`Literal maps' provide a relation between literal values, intended to create additional indexes on literals. The current implementation can only deal with integers and atoms (string literals). A literal map maintains an ordered set of keys. The ordering uses the same rules as described in section 4.5. Each key is associated with an ordered set of values. Literal map objects can be shared between threads, using a locking strategy that allows for multiple concurrent readers.
Typically, this module is used together with rdf_monitor/2 
on the channals new_literal and old_literal to 
maintain an index of words that appear in a literal. Further abstraction 
using Porter stemming or Metaphone can be used to create additional 
search indices. These can map either directly to the literal values, or 
indirectly to the plain word-map. The SWI-Prolog NLP package provides 
complimentary building blocks, such as a tokenizer, Porter stem and 
Double Metaphone.
rdf_litindex.pl.not(Key). If not-terms 
are provided, there must be at least one positive keywords. The 
negations are tested after establishing the positive matches.The library(semweb/rdf_persistency) 
provides reliable persistent storage for the RDF data. The store uses a 
directory with files for each source (see rdf_source/1) 
present in the database. Each source is represented by two files, one in 
binary format (see rdf_save_db/2) 
representing the base state and one represented as Prolog terms 
representing the changes made since the base state. The latter is called 
the journal.
cpu_count or 1 (one) on 
systems where this number is unknown. See also concurrent/3.true, supress loading messages from rdf_attach_db/2.true, nested log transactions are added to the 
journal information. By default (false), no log-term is 
added for nested transactions.
The database is locked against concurrent access using a file
lock in Directory. An attempt to attach to a 
locked database raises a permission_error exception. The 
error context contains a term rdf_locked(Args), where args 
is a list containing time(Stamp) and pid(PID). 
The error can be caught by the application. Otherwise it prints:
ERROR: No permission to lock rdf_db `/home/jan/src/pl/packages/semweb/DB' ERROR: locked at Wed Jun 27 15:37:35 2007 by process id 1748
false, the 
journal and snapshot for the database are deleted and further changes to 
triples associated with DB are not recorded. If Bool 
is true a snapshot is created for the current state and 
further modifications are monitored. Switching persistency does not 
affect the triples in the in-memory RDF database.min_size(KB) only 
journals larger than KB Kbytes are merged with the base 
state. Flushing a journal takes the following steps, ensuring a stable 
state can be recovered at any moment.
.new.
.new file over the base 
state.
Note that journals are not merged automatically for two reasons. First of all, some applications may decide never to merge as the journal contains a complete changelog of the database. Second, merging large databases can be slow and the application may wish to schedule such actions at quiet times or scheduled maintenance periods.
The above predicates suffice for most applications. The predicates in 
this section provide access to the journal files and the base state 
files and are intented to provide additional services, such as reasoning 
about the journals, loaded files, etc.3A 
library library(rdf_history) is under development 
exploiting these features supporting wiki style editing of RDF.
Using rdf_transaction(Goal, log(Message)), we can add 
additional records to enrich the journal of affected databases with Term 
and some additional bookkeeping information. Such a transaction adds a 
term
begin(Id, Nest, Time, Message) before the change operations 
on each affected database and end(Id, Nest, Affected) after 
the change operations. Here is an example call and content of the 
journal file mydb.jrn. A full explanation of the terms that 
appear in the journal is in the description of rdf_journal_file/2.
?- rdf_transaction(rdf_assert(s,p,o,mydb), log(by(jan))).
start([time(1183540570)]). begin(1, 0, 1183540570.36, by(jan)). assert(s, p, o). end(1, 0, []). end([time(1183540578)]).
Using rdf_transaction(Goal, log(Message, DB)), where DB 
is an atom denoting a (possibly empty) named graph, the system 
guarantees that a non-empty transaction will leave a possibly empty 
transaction record in DB. This feature assumes named graphs are named 
after the user making the changes. If a user action does not affect the 
user's graph, such as deleting a triple from another graph, we still 
find record of all actions performed by some user in the journal of that 
user.
time(Stamp).time(Stamp).log(Message). Id is an 
integer counting the logged transactions to this database. Numbers are 
increasing and designed for binary search within the journal file.
Nest is the nesting level, where `0' is a toplevel 
transaction.
Time is a time-stamp, currently using float notation with two 
fractional digits. Message is the term provided by the user 
as argument of the log(Message) transaction.log(Message). Id and Nest 
match the begin-term. Others gives a list of other databases 
affected by this transaction and the Id of these records. The 
terms in this list have the format DB:Id.
.trp for the base state and .jrn for the 
journal.
This module implements the Turtle language for representing the RDF triple model as defined by Dave Beckett from the Institute for Learning and Research Technology University of Bristol and later standardized by the W3C RDF working group.
This module acts as a plugin to rdf_load/2, 
for processing files with one of the extensions .ttl or .n3.
rdf(Subject, Predicate, Object [, Graph])
The representation is consistent with the SWI-Prolog RDF/XML and ntriples parsers. Provided options are:
node(1), node(2), ...
auto (default), turtle or trig. 
The auto mode switches to TRiG format of there is a
{ before the first triple. Finally, of the format is 
explicitly stated as turtle and the file appears to be a 
TRiG file, a warning is printed and the data is loaded while ignoring 
the graphs.
->IRI mapping because 
this rarely causes errors. To force strictly conforming mode, pass iri.
prefixes(Pairs). Compatibility to rdf_load/2.
[] if there is no base-uri.
warning (default), print the error and continue parsing 
the remainder of the file. If error, abort with an 
exception on the first error encountered.
on_error(warning) is active, this option cane be used to 
retrieve the number of generated errors.
| Input | is one of stream(Stream),atom(Atom), 
ahttp,httpsorfileurl or a filename specification 
as accepted by absolute_file_name/3. | 
rdf(S,P,O) terms for a normal Turtle file or rdf(S,P,O,G) 
terms if the GRAPH keyword is used to associate a set of 
triples in the document with a particular graph. The Graph 
argument provides the default graph for storing the triples and Line 
is the line number where the statement started.
call(OnObject, ListOfTriples, Graph:Line)
This predicate supports the same Options as rdf_load_turtle/3.
Errors encountered are sent to print_message/2, after which the parser tries to recover and parse the remainder of the data.
graph(+Graph) option and instead processes one additional 
option:
encoding(utf8),
indent(0),
tab_distance(0),
subject_white_lines(1),
align_prefixes(false),
user_prefixes(false)
comment(false),
group(false),
single_line_bnodes(true)
true (default), use a for the predicate rdf:type. 
Otherwise use the full resource.
true (default false), emit numeric 
datatypes using Prolog's write to achieve canonical output.
true (default), write some informative comments between 
the output segments
true (default), using P-O and O-grouping.
true (default), inline bnodes that are used once.
true (default), omit the type if allowed by turtle.
true (default false), do not print the 
final informational message.
true (default false), write [...] and (...) 
on a single line.
true (default), use prefixes from rdf_current_prefix/2.
The option expand allows for serializing alternative 
graph representations. It is called through call/5, 
where the first argument is the expand-option, followed by S,P,O,G. G is 
the graph-option (which is by default a variable). This notably allows 
for writing RDF graphs represented as rdf(S,P,O) using the 
following code fragment:
triple_in(RDF, S,P,O,_G) :-
    member(rdf(S,P,O), RDF).
    ...,
    rdf_save_turtle(Out, [ expand(triple_in(RDF)) ]),
| Out | is one of stream(Stream), 
a stream handle, a file-URL or an atom that denotes a filename. | 
The library(semweb/rdf_ntriples) provides a fast reader 
for the RDF N-Triples and N-Quads format. N-Triples is a simple format, 
originally used to support the W3C RDF test suites. The current format 
has been extended and is a subset of the Turtle format (see
library(semweb/turtle)).
The API of this library is almost identical to library(semweb/turtle). 
This module provides a plugin into rdf_load/2, 
making this predicate support the format ntriples and nquads.
| Triple | is a term triple(Subject,Predicate,Object). 
Arguments follow the normal conventions of the RDF libraries. NodeID 
elements are mapped tonode(Id). If end-of-file is reached, Triple 
is unified withend_of_file. | 
| Quad | is a term quad(Subject,Predicate,Object,Graph). 
Arguments follow the normal conventions of the RDF libraries. NodeID 
elements are mapped tonode(Id). If end-of-file is reached, Quad 
is unified withend_of_file. | 
triple(Subject,Predicate,Object)
quad(Subject,Predicate,Object,Graph).
node(_), bnodes are returned as node(Id).
:<baseuri>_
warning (default) or error
on_error is warning, unify Count 
with th number of errors.
| Triples | is a list of rdf(Subject, Predicate, Object) | 
| Quads | is a list of rdf(Subject, Predicate, Object, Graph) | 
graph(Graph).
| CallBack | is called as call(CallBack, Triples, Graph), 
where Triples is a list holding a singlerdf(S,P,O)triple. 
Graph is passed from thegraphoption and unbound if this 
option is omitted. | 
ntriples and nquads 
formats.nt,
ntriples and nquads.
This module implements extraction of RDFa triples from parsed XML or HTML documents. It has two interfaces: read_rdfa/3 to read triples from some input (stream, file, URL) and xml_rdfa/3 to extract triples from an HTML or XML document that is already parsed with load_html/3 or load_xml/3.
rdf(S,P,O) 
triples extracted from
Input. Input is either a stream, a file name, a 
URL referencing a file name or a URL that is valid for http_open/3. Options 
are passed to open/4, http_open/3 
and xml_rdfa/3. If no base is 
provided in Options, a base is deduced from Input.rdf(S,P,O) terms 
extracted from DOM according to the RDFa specification. Options 
processed:
lang
vocab
library(semweb/rdfa) as loader for HTML RDFa 
files.
The library(semweb/rdfs) 
library adds interpretation of the triple store in terms of concepts 
from RDF-Schema (RDFS). There are two ways to provide support for more 
high level languages in RDF. One is to view such languages as a set of entailment 
rules. In this model the rdfs library would provide a predicate rdfs/3 
providing the same functionality as rdf/3 
on union of the raw graph and triples that can be derived by applying 
the RDFS entailment rules.
Alternatively, RDFS provides a view on the RDF store in terms of 
individuals, classes, properties, etc., and we can provide predicates 
that query the database with this view in mind. This is the approach 
taken in the library(semweb/rdfs.p)l library, providing 
calls like
rdfs_individual_of(?Resource, ?Class).5The 
SeRQL language is based on querying the deductive closure of the triple 
set. The SWI-Prolog SeRQL library provides entailment modules 
that take the approach outlined above.
The predicates in this section explore the rdfs:subPropertyOf,
rdfs:subClassOf and rdf:type relations. Note 
that the most fundamental of these, rdfs:subPropertyOf, is 
also used by rdf_has/[3,4].
rdfs:subPropertyOf relation. It can be used to test as well 
as generate sub-properties or super-properties. Note that the commonly 
used semantics of this predicate is wired into rdf_has/[3,4].bugThe 
current implementation cannot deal with cycles.bugThe 
current implementation cannot deal with predicates that are an rdfs:subPropertyOf 
of rdfs:subPropertyOf, such as owl:samePropertyAs.rdfs:subClassOf relation. It can be used to test as well as 
generate sub-classes or super-classes.bugThe 
current implementation cannot deal with cycles.rdf:type property that refers to
Class or a sub-class thereof. Can be used to test, generate 
classes Resource belongs to or generate individuals described 
by Class.
The 
RDF construct rdf:parseType=Collection 
constructs a list using the rdf:first and rdf:next 
relations.
rdf:List or rdfs:Container.rdf:List 
into a Prolog list of objects.user.Complex projects require RDF resources from many locations and 
typically wish to load these in different combinations. For example 
loading a small subset of the data for debugging purposes or load a 
different set of files for experimentation. The library library(semweb/rdf_library.pl) 
manages sets of RDF files spread over different locations, including 
file and network locations. The original version of this library 
supported metadata about collections of RDF sources in an RDF file 
called Manifest. The current version supports both the
VoID format and the 
original format. VoID files (typically named void.ttl) can 
use elements from the RDF Manifest vocabulary to support features that 
are not supported by VoID.
A manifest file is an RDF file, often in
Turtle 
format, that provides meta-data about RDF resources. Often, a manifest 
will describe RDF files in the current directory, but it can also 
describe RDF resources at arbitrary URL locations. The RDF schema for 
RDF library meta-data can be found in rdf_library.ttl. The 
namespace for the RDF library format is defined as
http://www.swi-prolog.org/rdf/library/ 
and abbreviated as
lib.
The schema defines three root classes: lib:Namespace, lib:Ontology and lib:Virtual, which we describe below.
/wn-basic 
and wn-full as virtual resources. The lib:Virtual resource 
is used as a second rdf:type:
<wn-basic>
        a lib:Ontology ;
        a lib:Virtual ;
        ...
@prefix lib: <http://www.swi-prolog.org/rdf/library/> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . [ a lib:Namespace ; lib:mnemonic "rdfs" ; lib:namespace rdfs: ] .
The VoID aims at resolving the same problem as the Manifest files described here. In addition, the VANN vocabulary provides the information about preferred namepaces prefixes. The RDF library manager can deal with VoID files. The following relations apply:
Dataset and Linkset are similar to
lib:Ontology, but a VoID resource is always
Virtual. I.e., the VoID URI itself never refers to an RDF 
document.
owl:imports and its lib specializations are 
replaced by void:subset (referring to another VoID dataset) 
and void:dataDump (referring to a concrete document).
dcterms:description 
rather than rdfs:comment
lib:source, lib:baseURI 
and lib:Cloudnode, which have no equivalent in VoID.
vann:preferredNamespacePrefix 
and
vann:preferredNamespaceUri as alternatives to its 
proprietary way for defining prefixes. The domain of these predicates is 
unclear. The library recognises them regardless of the domain. Note that 
the range of vann:preferredNamespaceUri is a literal. 
A disadvantage of that is that the Turtle prefix declaration cannot be 
reused.
Currently, the RDF metadata is not stored in the RDF database. It is processed by low-level primitives that do not perform RDFS reasoning. In particular, this means that rdfs:supPropertyOf and rdfs:subClassOf cannot be used to specialise the RDF meta vocabulary.
The initial metadata file(s) are loaded into the system using rdf_attach_library/1.
void.ttl,
Manifest.ttl or Manifest.rdf is loaded (in 
this order of preference).
Declared namespaces are added to the rdf-db namespace list. 
Encountered ontologies are added to a private database of
rdf_list_library.pl. Each ontology is given an
identifier, derived from the basename of the URL without the 
extension. This, using the declaration below, the identifier of the 
declared ontology is wn-basic.
<wn-basic>
        a void:Dataset ;
        dcterms:title "Basic WordNet" ;
        ...
It is possible for the initial set of manifests to refer to RDF files that are not covered by a manifest. If such a reference is encountered while loading or listing a library, the library manager will look for a manifest file in the directory holding the referenced RDF file and load this manifest. If a manifest is found that covers the referenced file, the directives found in the manifest will be followed. Otherwise the RDF resource is simply loaded using the current defaults.
Further exploration of the library is achieved using rdf_list_library/1 or rdf_list_library/2:
rdf_list_library(Id,[]).Typically, a project will use a single file using the same format as a manifest file that defines alternative configurations that can be loaded. This file is loaded at program startup using rdf_attach_library/1. Users can now list the available libraries using rdf_list_library/0 and rdf_list_library/1:
1 ?- rdf_list_library. ec-core-vocabularies E-Culture core vocabularies ec-all-vocabularies All E-Culture vocabularies ec-hacks Specific hacks ec-mappings E-Culture ontology mappings ec-core-collections E-Culture core collections ec-all-collections E-Culture all collections ec-medium E-Culture medium sized data (artchive+aria) ec-all E-Culture all data
Now we can list a specific category using rdf_list_library/1. 
Note this loads two additional manifests referenced by resources 
encountered in
ec-mappings. If a resource does not exist is is flagged 
using
[NOT FOUND].
2 ?- rdf_list_library('ec-mappings').
% Loaded RDF manifest /home/jan/src/eculture/vocabularies/mappings/Manifest.ttl
% Loaded RDF manifest /home/jan/src/eculture/collections/aul/Manifest.ttl
<file:///home/jan/src/eculture/src/server/ec-mappings>
. <file:///home/jan/src/eculture/vocabularies/mappings/mappings>
. . <file:///home/jan/src/eculture/vocabularies/mappings/interface>
. . . file:///home/jan/src/eculture/vocabularies/mappings/interface_class_mapping.ttl
. . . file:///home/jan/src/eculture/vocabularies/mappings/interface_property_mapping.ttl
. . <file:///home/jan/src/eculture/vocabularies/mappings/properties>
. . . file:///home/jan/src/eculture/vocabularies/mappings/ethnographic_property_mapping.ttl
. . . file:///home/jan/src/eculture/vocabularies/mappings/eculture_properties.ttl
. . . file:///home/jan/src/eculture/vocabularies/mappings/eculture_property_semantics.ttl
. . <file:///home/jan/src/eculture/vocabularies/mappings/situations>
. . . file:///home/jan/src/eculture/vocabularies/mappings/eculture_situations.ttl
. <file:///home/jan/src/eculture/collections/aul/aul>
. . file:///home/jan/src/eculture/collections/aul/aul.rdfs
. . file:///home/jan/src/eculture/collections/aul/aul.rdf
. . file:///home/jan/src/eculture/collections/aul/aul9styles.rdf
. . file:///home/jan/src/eculture/collections/aul/extractedperiods.rdf
. . file:///home/jan/src/eculture/collections/aul/manual-periods.rdf
Resources and manifests are located either on the local filesystem or 
on a network resource. The initial manifest can also be loaded from a 
file or a URL. This defines the initial base URL of the 
document. The base URL can be overruled using the Turtle @base 
directive. Other documents can be referenced relative to this base URL 
by exploiting Turtle's URI expansion rules. Turtle resources can be 
specified in three ways, as absolute URLs (e.g. <http://www.example.com/rdf/ontology.rdf>), 
as relative URL to the base (e.g. <../rdf/ontology.rdf>) 
or following a
prefix (e.g. prefix:ontology).
The prefix notation is powerful as we can define multiple of them and 
define resources relative to them. Unfortunately, prefixes can only be 
defined as absolute URLs or URLs relative to the base URL. Notably, they 
cannot be defined relative to other prefixes. In addition, a prefix can 
only be followed by a Qname, which excludes . and /.
Easily relocatable manifests must define all resources relative to the base URL. Relocation is automatic if the manifest remains in the same hierarchy as the resources it references. If the manifest is copied elsewhere (i.e. for creating a local version) it can use @base to refer to the resource hierarchy. We can point to directories holding manifest files using @prefix declarations. There, we can reference Virtual resources using prefix:name. Here is an example, were we first give some line from the initial manifest followed by the definition of the virtual RDFS resource.
@base <http://gollem.science.uva.nl/e-culture/rdf/> .
@prefix base:           <base_ontologies/> .
<ec-core-vocabularies>
        a lib:Ontology ;
        a lib:Virtual ;
        dc:title "E-Culture core vocabularies" ;
        owl:imports
                base:rdfs ,
                base:owl ,
                base:dc ,
                base:vra ,
                ...
<rdfs>
        a lib:Schema ;
        a lib:Virtual ;
        rdfs:comment "RDF Schema" ;
        lib:source rdfs: ;
        lib:schema <rdfs.rdfs> .
In this section we provide skeleton code for filling the RDF database from a password protected HTTP repository. The first line loads the application. Next we include modules that enable us to manage the RDF library, RDF database caching and HTTP connections. Then we setup the HTTP authentication, enable caching of processed RDF files and load the initial manifest. Finally load_data/0 loads all our RDF data.
:- use_module(server).
:- use_module(library(http/http_open)).
:- use_module(library(semweb/rdf_library)).
:- use_module(library(semweb/rdf_cache)).
:- http_set_authorization('http://www.example.org/rdf',
                          basic(john, secret)).
:- rdf_set_cache_options([ global_directory('RDF-Cache'),
                           create_global_directory(true)
                         ]).
:- rdf_attach_library('http://www.example.org/rdf/Manifest.ttl').
%%      load_data
%
%       Load our RDF data
load_data :-
        rdf_load_library('all').
The VoID metadata below allows for loading WordNet in the two predefined versions using one of
?- rdf_load_library('wn-basic', []).
?- rdf_load_library('wn-full', []).
@prefix    void: <http://rdfs.org/ns/void#> .
@prefix    vann: <http://purl.org/vocab/vann/> .
@prefix     lib: <http://www.swi-prolog.org/rdf/library/> .
@prefix     owl: <http://www.w3.org/2002/07/owl#> .
@prefix     rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix    rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix     xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix      dc: <http://purl.org/dc/terms/> .
@prefix   wn20s: <http://www.w3.org/2006/03/wn/wn20/schema/> .
@prefix   wn20i: <http://www.w3.org/2006/03/wn/wn20/instances/> .
[ vann:preferredNamespacePrefix "wn20i" ;
  vann:preferredNamespaceUri "http://www.w3.org/2006/03/wn/wn20/instances/"
] .
[ vann:preferredNamespacePrefix "wn20s" ;
  vann:preferredNamespaceUri "http://www.w3.org/2006/03/wn/wn20/schema/"
] .
<wn20-common>
        a void:Dataset ;
        dc:description "Common files between full and basic version" ;
        lib:source wn20i: ;
        void:dataDump
                <wordnet-attribute.rdf.gz> ,
                <wordnet-causes.rdf.gz> ,
                <wordnet-classifiedby.rdf.gz> ,
                <wordnet-entailment.rdf.gz> ,
                <wordnet-glossary.rdf.gz> ,
                <wordnet-hyponym.rdf.gz> ,
                <wordnet-membermeronym.rdf.gz> ,
                <wordnet-partmeronym.rdf.gz> ,
                <wordnet-sameverbgroupas.rdf.gz> ,
                <wordnet-similarity.rdf.gz> ,
                <wordnet-synset.rdf.gz> ,
                <wordnet-substancemeronym.rdf.gz> ,
                <wordnet-senselabels.rdf.gz> .
<wn20-skos>
        a void:Dataset ;
        void:subset <wnskosmap> ;
        void:dataDump <wnSkosInScheme.ttl.gz> .
<wnskosmap>
        a lib:Schema ;
        lib:source wn20s: ;
        void:dataDump
                <wnskosmap.rdfs> .
<wnbasic-schema>
        a void:Dataset ;
        lib:source wn20s: ;
        void:dataDump
                <wnbasic.rdfs> .
<wn20-basic>
        a void:Dataset ;
        a lib:CloudNode ;
        dc:title "Basic WordNet" ;
        dc:description "Light version of W3C WordNet" ;
        owl:versionInfo "2.0" ;
        lib:source wn20i: ;
        void:subset
                <wnbasic-schema> ,
                <wn20-skos> ,
                <wn20-common> .
<wnfull-schema>
        a void:Dataset ;
        lib:source wn20s: ;
        void:dataDump
                <wnfull.rdfs> .
<wn20-full>
        a void:Dataset ;
        a lib:CloudNode ;
        dc:title "Full WordNet" ;
        dc:description "Full version of W3C WordNet" ;
        owl:versionInfo "2.0" ;
        lib:source wn20i: ;
        void:subset
                <wnfull-schema> ,
                <wn20-skos> ,
                <wn20-common> ;
        void:dataDump
                <wordnet-antonym.rdf.gz> ,
                <wordnet-derivationallyrelated.rdf.gz> ,
                <wordnet-participleof.rdf.gz> ,
                <wordnet-pertainsto.rdf.gz> ,
                <wordnet-seealso.rdf.gz> ,
                <wordnet-wordsensesandwords.rdf.gz> ,
                <wordnet-frame.rdf.gz> .
This module provides a SPARQL client. For example:
?- sparql_query('select * where { ?x rdfs:label "Amsterdam" }', Row,
                [ host('dbpedia.org'), path('/sparql/')]).
Row = row('http://www.ontologyportal.org/WordNet#WN30-108949737') ;
false.
Or, querying a local server using an ASK query:
?- sparql_query('ask { owl:Class rdfs:label "Class" }', Row,
                [ host('localhost'), port(3020), path('/sparql/')]).
Row = true.
rdf(S,P,O) for CONSTRUCT and DESCRIBE 
queries, row(...) for
SELECT queries and true or false 
for ASK queries.
Options are
SELECT query.
Remaining options are passed to http_open/3. 
The defaults for Host, Port and Path can be set using sparql_set_server/1. 
The initial default for port is 80 and path is /sparql/.
For example, the ClioPatria server understands the parameter
entailment. The code below queries for all triples using 
_rdfs_entailment.
?- sparql_query('select * where { ?s ?p ?o }',
                Row,
                [ search([entailment=rdfs])
                ]).
    sparql_set_server([ host(localhost),
                        port(8080)
                        path(world)
                      ])
The default for port is 80 and path is /sparql/.
v(Name, ...) and Rows 
is a list of row(....) containing the column values in the 
same order as the variable names.
true or false
v(Name, ...) and Rows 
is a list of row(....) containing the column values in the 
same order as the variable names.
true or false
This library provides predicates that compare RDF graphs. The current version only provides one predicate: rdf_equal_graphs/3 verifies that two graphs are identical after proper labeling of the blank nodes.
Future versions of this library may contain more advanced operations, such as diffing two graphs.
| GraphA | is a list of rdf(S,P,O)terms | 
| GraphB | is a list of rdf(S,P,O)terms | 
| Substition | is a list if NodeA = NodeB terms. | 
This module defines rules for user:portray/1 to help tracing and debugging RDF resources by printing them in a more concise representation and optionally adding comment from the label field to help the user interpreting the URL. The main predicates are:
prefix:idwriteqprefix:labelprefix:id=labelThe core infrastructure for storing and querying RDF is provided by this package, which is distributed as a core package with SWI-Prolog. ClioPatria provides a comprehensive server infrastructure on top of the semweb and http packages. ClioPatria provides a SPARQL 1.1 endpoint, linked open data (LOD) support, user management, a web interface and an extension infrastructure for programming (semantic) web applications.
Thea provides access to OWL ontologies at the level of the abstract syntax. Can interact with external DL reasoner using DIG.
RDF-DB version 3 is a major redesign of the SWI-Prolog RDF infrastructure. Nevertheles, version 3 is almost perfectly upward compatible with version 2. Below are some issues to take into consideration when upgrading.
Version 2 did not allow for modifications while read operations were in progress, for example due to an open choice point. As a consequence, operations that both queried and modified the database had to be wrapped in a transaction or the modifications had to be buffered as Prolog data structures. In both cases, the RDF store was not modified during the query phase. In version 3, modifications are allowed while read operations are in progress and follow the Prolog logical update view semantics. This is different from using a transaction in version 2, where the view for all read operations was frozen at the start of the transaction. In version 3, every read operation sees the store frozen at the moment that the operation was started.
We illustrate the difference by writing a forwards entailment rule that adds a sibling relation. In version 2, we could perform this operation using one of the following:
add_siblings_1 :-
        findall(S-O,
                ( rdf(S, f:parent, P),
                  rdf(O, f:parent, P),
                  S \== O
                ),
                Pairs),
        forall(member(S-O, Pairs), rdf_assert(S,f:sibling,O)).
add_siblings_2 :-
        rdf_transaction(
            forall(( rdf(S, f:parent, P),
                     rdf(O, f:parent, P),
                     S \== O
                   ),
                   rdf_assert(S, f:sibling, O))).
In version 3, we can write this in the natural Prolog style below. In itself, this may not seem a big advantage because wrapping such operations in a transaction is often a good style anyway. The story changes with more complicated constrol structures that combine iterations with steps that depend on triples asserted in previous steps. Such scenarios can be programmed naturally in the current version.
add_siblings_3 :-
        forall(( rdf(S, f:parent, P),
                 rdf(O, f:parent, P),
                 S \== O
               ),
               rdf_assert(S, f:sibling, O)).
In version 3, code that combines queries with modification has the same semantics whether executed inside or outside a transaction. This property makes reusing such predicates predictable.
sources is renamed into graphs
triples_by_file is renamed into
triples_by_graph
gc has additional arguments
core is removed.
This research was supported by the following projects: MIA and MultimediaN project (www.multimedian.nl) funded through the BSIK programme of the Dutch Government, the FP-6 project HOPS of the European Commission, the COMBINE project supported by the ONR Global NICOP grant N62909-11-1-7060 and the Dutch national program COMMIT.