CLOCQ

CLOCQ API - Navigation

Description
Retrieve search space for an input string
Retrieve 1-hop neighborhood
Run entity linking
Run relation linking
Retrieve label
Retrieve aliases
Retrieve description
Retrieve types
Retrieve most frequent type
Compute frequency
Connectivity check
Shortest path(s)
Paper
Contact

Description

The CLOCQ API currently provides access and makes use of the 2022-01-31 Wikidata dump. The API might be updated with the latest Wikidata dump from time to time. As a rule of thumb, we consider updating the dump on a yearly basis.

We provide an API for accessing the CLOCQ code, and making use of its efficient KB-representation to retrieve information from Wikidata. Please note, that the API should not be used for an efficiency analysis of the method, since the API setup is not really optimized in that regard. Specifically, it is not yet clear how well the API scales when accessed via multiple clients simultaneously. The API currently uses the Wikidata dump downloaded on 2022-01-31, which has been processed as outlined in the paper.

For simple integration into your Python code, we make a client for the CLOCQ API available.
Download CLOCQ API Client

Overall, this API has been accessed 109,348,503 times by now.

Please do not hesitate to contact us in case of any questions. Also, any kind of feedback is higly appreciated!

Retrieve search space (Wikidata facts) for an input string

For the given input string disambiguate KB-items and retrieve relevant KB-facts from Wikidata. Parameters essentially control the number of disambiguations per question word (k), and the amount of potentially noisier KB-facts (p).

Parameters - via URL (direct request on URL):

- `question`:

the input string you want to get relevant Wikidata facts for.

- `k` - optional (default="AUTO"):

the number of disambiguations per question word. "AUTO" would set an individual k for each question word, based on a heuristic ambiguity measurement (default setting).

- `p` - optional (default=1000):

threshold that controls the amount of potentially noisier facts per disambiguated KB-item. Specifically, we divide the facts of a KB-item x into facts with [1] x as subject, [2] x as object, or [3] x as qualifier-object. If the total number of facts in [2]+[3] is higher than p, we consider only the facts in [1]. Otherwise, all facts are considered (i.e. [1]+[2]+[3]). This follows the intuition that facts with the KB-item in the subject position are more salient. p should not be misunderstood as a strict pruning threshold on the facts (i.e. if p=100, there may still be more than 100 facts in the result).

- `include_labels` - optional (default=True):

controls whether the items in the facts are represented as plain Wikidata IDs, or a dictionary with "id" and "label".

Parameters - via JSON-object (e.g. using the CLOCQ API Client):

- `question`:

the input string you want to get relevant Wikidata facts for.

- `parameters` - optional (default: default parameters of CLOCQ):

the parameters of CLOCQ. You can add whatever parameters you would like to change. For parameters that are not included, the default value will be used. Is a dict with keys:

- `k` (default="AUTO"): the number of disambiguations per question word. "AUTO" would set an individual k for each question word, based on a heuristic ambiguity measurement (default setting).
- `p_setting` (default=1000): threshold that controls the amount of potentially noisier facts per disambiguated KB-item. Specifically, we divide the facts of a KB-item x into facts with [1] x as subject, [2] x as object, or [3] x as qualifier-object. If the total number of facts in [2]+[3] is higher than p, we consider only the facts in [1]. Otherwise, all facts are considered (i.e. [1]+[2]+[3]). This follows the intuition that facts with the KB-item in the subject position are more salient. p should not be misunderstood as a strict pruning threshold on the facts (i.e. if p=100, there may still be more than 100 facts in the result).
- `h_match` (default=0.4): impact of matching similarity on disambiguations.
- `h_rel` (default=0.3): impact of question relevance on disambiguations.
- `h_conn` (default=0.2): impact of item-item connectivity on disambiguations.
- `h_coh` (default=0.1): impact of item-item coherence on disambiguations.
- `d` (default=20): list depth to be considered per question word.
- `bm25_limit` (default=False): amount of total output facts, as scored by BM25. `False` will deactivate BM25 scoring.

- `include_labels` - optional (default=True):

controls whether the items in the facts are represented as plain Wikidata IDs, or a dictionary with "id" and "label".

Returns:

- `kb_item_tuple`:

list of disambiguations with keys "item" (dictionary with "id" and "label"), "question_word" (KB-item mention in input text), "rank" (number) and "score" (float).

- `search_space`:

list of KB-facts. Each KB-fact is represented as a list of KB-items (dictionary with "id" and "label").

GET /api/search_space

?question=

Required field.

&k=

Optional field.

&p=

Optional field.

Retrieve 1-hop neighborhood for given Wikidata-item ID

For the given KB-item, retrieve the KB-facts in which the item occurs from Wikidata.

Parameters:

- `item`:

the input KB-item, for which the KB-facts should be retrieved. Examples: "Q38111" (L. DiCaprio) or "P161" (cast member).

- `p` - optional (default=1000):

- `include_labels` - optional (default=True):

controls whether the items in the facts are represented as plain Wikidata IDs, or a dictionary with "id" and "label".

Returns:

A list of KB-facts. Each KB-fact is represented as a list of KB-items.

Run entity linking

For the given question, link the entity mentions to entities in the KB. The method was designed for the SMART 2022 Task, and the code for training the model on the SMART 2022 data can be found here.

Parameters:

- `item`:

the input question (or string), for which the entities should be linked.

- `k` - optional (default="AUTO"):

the number of entity linkings per mention. "AUTO" would set an individual k for each mention, based on a heuristic ambiguity measurement (default setting).

Returns:

A dictionary with keys "linkings" and "mentions". We return the list of "linkings", each having the KB-item (dictionary with "id" and "label") and the mention the entity was linked for. In the list for the "mentions" key, the generated strings by the sequence-to-sequence model are stored. Note, that the mentions for the individual linkings may match with the generated mentions only partially.

Run relation linking

For the given question, link the relation mentions to relations in the KB. The method was designed for the SMART 2022 Task. Unlike for the entity linking, no training is involved here.

Parameters:

- `item`:

the input question (or string), for which the relations should be linked.

- `top_ranked` - optional (default=True):

whether only the top-ranked relation per mention should be returned or not. If this parameter is set to `False`, multiple relations found relevant for the respective mention will be returned (improves recall, but hurts precision).

Returns:

A dictionary with keys "linkings" and "mentions". We return the list of "linkings", each having the KB-item (dictionary with "id" and "label") and the mention the relation was linked for. Here, the "mentions" key holds the union of all relation mentions that are linked in the "linkings" list.

Retrieve label for given Wikidata-item ID

Retrieve the English label for the given KB-item.

Parameters:

- `item`:

the input KB-item, for which the label should be retrieved.
Examples: "Q38111" (L. DiCaprio) or "P161" (cast member).

Returns:

A string.

Retrieve aliases for given Wikidata-item ID (if available)

Retrieve the English aliases for the given KB-item, in case there are any.

Parameters:

- `item`:

the input KB-item, for which the aliases should be retrieved.
Examples: "Q38111" (L. DiCaprio) or "P161" (cast member).

Returns:

A list of strings.

Retrieve description for given Wikidata-item ID (if available)

Retrieve the English description for the given KB-item, in case there is one.

Parameters:

- `item`:

the input KB-item, for which the description should be retrieved.
Examples: "Q38111" (L. DiCaprio) or "P161" (cast member).

Returns:

A string.

Retrieve types for given Wikidata-item ID

Retrieve the types for the given KB-item. For humans, in addition to the "human" type, the occupations are added (similar as in other KBs).

Parameters:

- `item`:

the input KB-item, for which the types should be retrieved.
Examples: "Q38111" (L. DiCaprio) or "P161" (cast member).

Returns:

A list of KB-items (the types), that are represented as dictionaries with keys "id" and "label".

Retrieve most frequent type for given Wikidata-item ID

Retrieves only the most frequent type for the given KB-item. This is a proxy of the most prominent type which was present in Freebase.

Parameters:

- `item`:

the input KB-item, for which the type should be retrieved.
Examples: "Q38111" (L. DiCaprio) or "P161" (cast member).

Returns:

A single KB-item (the type), that is represented as a dictionary with keys "id" and "label".

Compute frequency for given Wikidata-item ID

Returns the frequency of the KB-item. The result is separated into two parts: 1) frequency of the KB-item appearing in the subject position of a fact, frequency of the KB-item appearing in non-subject positions in a fact (e.g. object or qualifier-object).

Parameters:

- `item`:

the input KB-item, for which the connectivity should be computed.
Examples: "Q38111" (L. DiCaprio) or "P161" (cast member).

Returns:

List of two positive integers.

Connectivity check for two Wikidata-item IDs (within 2-hops)

Computes the KB-distance between the two KB-items. The KB-distance is defined as in the paper, i.e. two KB-items are within 1-hop if they appear in the same KB-fact. Returns either 0 (no connection within 2-hops), 0.5 (connection in 2 hops), or 1 (connection within 1-hop). The order of KB-items does not matter.

Parameters:

- `item1`:

the first KB-item. Examples: "Q38111" (L. DiCaprio) or "P161" (cast member).

- `item2`:

the second KB-item. Examples: "Q38111" (L. DiCaprio) or "P161" (cast member).

Returns:

"0", "0.5", or "1".

Shortest path(s) between two Wikidata-item IDs (if available in 2-hops)

Computes the shortest path(s) between the two KB-items. The KB-distance is defined as in the paper, i.e. two KB-items are within 1-hop if they appear in the same KB-fact. Returns a list of paths between the items. Paths are either simple KB-facts (in case the items are in 1-hop), or combinations of paths with a middle node m in between item1 and item2, such that the first part of the tuple gives facts with item1 and m, and the second part of the tuple gives facts with m and item2 (in case the items are in 2-hop).

Parameters:

- `item1`:

the first KB-item. Examples: "Q38111" (L. DiCaprio) or "P161" (cast member).

- `item2`:

the second KB-item. Examples: "Q38111" (L. DiCaprio) or "P161" (cast member).

Returns:

A list of paths between the items. Paths are either simple KB-facts (in case the items are in 1-hop), or combinations of paths with a middle node m in between item1 and item2, such that the first part of the tuple gives facts with item1 and m, and the second part of the tuple gives facts with m and item2 (in case the items are in 2-hop).

Papers

Search space reduction
"Beyond NED: Fast and Effective Search Space Reduction for Complex Question Answering over Knowledge Bases", Philipp Christmann, Rishiraj Saha Roy, and Gerhard Weikum. In WSDM '22, Phoenix, Arizona, 21 - 25 February 2022.
[Extended version] [Code] [Poster] [Slides] [Video] [Extended Video]

KB interface
"CLOCQ: A Toolkit for Fast and Easy Access to Knowledge Bases", Philipp Christmann, Rishiraj Saha Roy, and Gerhard Weikum. In BTW '23, Dresden, Germany, 6 - 10 March 2023.

Entity and relation linking
"Question Entity and Relation Linking to Knowledge Bases via CLOCQ", Philipp Christmann, Rishiraj Saha Roy, and Gerhard Weikum. In SMART '22, Hangzhou, China, 27 October, 2022.

Contact

For feedback and clarifications, please contact: Philipp Christmann (pchristm AT mmci DOT uni HYPHEN saarland DOT de), Rishiraj Saha Roy (rishiraj AT mpi HYPHEN inf DOT mpg DOT de) or Gerhard Weikum (weikum AT mpi HYPHEN inf DOT mpg DOT de).

To know more about our group, please visit https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/question-answering/.