the input string you want to get relevant Wikidata facts for.
- `k` - optional (default="AUTO"):the number of disambiguations per question word. "AUTO" would set an individual k for each question word, based on a heuristic ambiguity measurement (default setting).
- `p` - optional (default=1000):threshold that controls the amount of potentially noisier facts per disambiguated KB-item. Specifically, we divide the facts of a KB-item x into facts with [1] x as subject, [2] x as object, or [3] x as qualifier-object. If the total number of facts in [2]+[3] is higher than p, we consider only the facts in [1]. Otherwise, all facts are considered (i.e. [1]+[2]+[3]). This follows the intuition that facts with the KB-item in the subject position are more salient. p should not be misunderstood as a strict pruning threshold on the facts (i.e. if p=100, there may still be more than 100 facts in the result).
- `include_labels` - optional (default=True):controls whether the items in the facts are represented as plain Wikidata IDs, or a dictionary with "id" and "label".
the input string you want to get relevant Wikidata facts for.
- `parameters` - optional (default: default parameters of CLOCQ):the parameters of CLOCQ. You can add whatever parameters you would like to change. For parameters that are not included, the default value will be used. Is a dict with keys:
- `k` (default="AUTO"):
the number of disambiguations per question word. "AUTO" would set an individual k for each question word, based on a heuristic ambiguity measurement (default setting).
- `p_setting` (default=1000):
threshold that controls the amount of potentially noisier facts per disambiguated KB-item. Specifically, we divide the facts of a KB-item x into facts with [1] x as subject, [2] x as object, or [3] x as qualifier-object. If the total number of facts in [2]+[3] is higher than p, we consider only the facts in [1]. Otherwise, all facts are considered (i.e. [1]+[2]+[3]). This follows the intuition that facts with the KB-item in the subject position are more salient. p should not be misunderstood as a strict pruning threshold on the facts (i.e. if p=100, there may still be more than 100 facts in the result).
- `h_match` (default=0.4): impact of matching similarity on disambiguations.
- `h_rel` (default=0.3): impact of question relevance on disambiguations.
- `h_conn` (default=0.2): impact of item-item connectivity on disambiguations.
- `h_coh` (default=0.1): impact of item-item coherence on disambiguations.
- `d` (default=20): list depth to be considered per question word.
- `bm25_limit` (default=False): amount of total output facts, as scored by BM25. `False` will deactivate BM25 scoring.
controls whether the items in the facts are represented as plain Wikidata IDs, or a dictionary with "id" and "label".
list of disambiguations with keys "item" (dictionary with "id" and "label"), "question_word" (KB-item mention in input text), "rank" (number) and "score" (float).
- `search_space`:list of KB-facts. Each KB-fact is represented as a list of KB-items (dictionary with "id" and "label").
the input KB-item, for which the KB-facts should be retrieved. Examples: "Q38111" (L. DiCaprio) or "P161" (cast member).
- `p` - optional (default=1000):threshold that controls the amount of potentially noisier facts per disambiguated KB-item. Specifically, we divide the facts of a KB-item x into facts with [1] x as subject, [2] x as object, or [3] x as qualifier-object. If the total number of facts in [2]+[3] is higher than p, we consider only the facts in [1]. Otherwise, all facts are considered (i.e. [1]+[2]+[3]). This follows the intuition that facts with the KB-item in the subject position are more salient. p should not be misunderstood as a strict pruning threshold on the facts (i.e. if p=100, there may still be more than 100 facts in the result).
- `include_labels` - optional (default=True):controls whether the items in the facts are represented as plain Wikidata IDs, or a dictionary with "id" and "label".
the input question (or string), for which the entities should be linked.
- `k` - optional (default="AUTO"):the number of entity linkings per mention. "AUTO" would set an individual k for each mention, based on a heuristic ambiguity measurement (default setting).
the input question (or string), for which the relations should be linked.
- `top_ranked` - optional (default=True):whether only the top-ranked relation per mention should be returned or not. If this parameter is set to `False`, multiple relations found relevant for the respective mention will be returned (improves recall, but hurts precision).
the input KB-item, for which the label should be retrieved.
Examples: "Q38111" (L. DiCaprio) or "P161" (cast member).
the input KB-item, for which the aliases should be retrieved.
Examples: "Q38111" (L. DiCaprio) or "P161" (cast member).
the input KB-item, for which the description should be retrieved.
Examples: "Q38111" (L. DiCaprio) or "P161" (cast member).
the input KB-item, for which the types should be retrieved.
Examples: "Q38111" (L. DiCaprio) or "P161" (cast member).
the input KB-item, for which the type should be retrieved.
Examples: "Q38111" (L. DiCaprio) or "P161" (cast member).
the input KB-item, for which the connectivity should be computed.
Examples: "Q38111" (L. DiCaprio) or "P161" (cast member).
the first KB-item. Examples: "Q38111" (L. DiCaprio) or "P161" (cast member).
- `item2`:the second KB-item. Examples: "Q38111" (L. DiCaprio) or "P161" (cast member).
the first KB-item. Examples: "Q38111" (L. DiCaprio) or "P161" (cast member).
- `item2`:the second KB-item. Examples: "Q38111" (L. DiCaprio) or "P161" (cast member).