Some datasets are usually used as benchmarks for knowledge embedding, including FB15K, FB13, WN18 and WN11. We provide FB15K and WN18 as examples to introduce the format of input files for our framework.
Datasets are required in the following format, containing five files:
The original data can also be downloaded from:
FB15K, WN18 are published by "Translating Embeddings for Modeling Multi-relational Data (2013)." [download]
FB13, WN11 are published by "Reasoning With Neural Tensor Networks for Knowledge Base Completion". [download]
We provide several toolkits for knowledge embedding, containing the following four repositories:
This is an Efficient implementation based on TensorFlow for knowledge representation learning (KRL). We use C++ to implement some underlying operations such as data preprocessing and negative sampling. For each specific model, it is implemented by TensorFlow with Python interfaces so that there is a convenient platform to run models on GPUs.
OpenKE provides simple interfaces to train and test various KRL models, which does not need too much efforts for redundant data processing and memory control. OpenKE has implemented some classic and effective models to support knowledge embedding, these models include:
We provide tutorials for training these models. Additionally, we use some simple examples to show how to build a new model based on OpenKE.
KB2E is the early implementation of some knowledge embedding models, and many resources are used in our following works. These codes will be gradually integrated into the new framework OpenKE. This is a basic and stable knowledge graph embedding toolkit, including TransE, TransH, TransR and PTransE. The toolkit implementation conforms to the original paper setting of models, which makes it stable for experiments in research work.
This is an efficient lightweight implementation of TransE and its extended models for knowledge representation learning, including TransH, TransR, TransD, TranSparse and PTransE. The overall framework has underlying design changes for acceleration and supports multi-threading training. Fast-TransX is designed for simple and quick deployment utilizing the framework of OpenKE.
This is a light and simple version of OpenKE based on TensorFlow, including TransE, TransH, TransR and TransD. Similar to Fast-TransX, TensorFlow-TransX is implemented to avoid complicated encapsulation utilizing the same framework of OpenKE.
Available pretrained embeddings of the existing large-scale knowledge graphs trained using OpenKE (These are all currently trained via TransE. More models will come if necessary).
The knowledge graphs and embeddings contain the following five files:
File descriptions and download links:
Knowledge Graph | Description | Size | Download |
---|---|---|---|
Wikidata | Embeddings of the entities | > 4GB | Download |
Embeddings of the relations | < 1MB | ||
List of the entity ids | 360MB | ||
List of the relation ids | < 1MB | ||
List of the triple ids | 1GB | ||
Freebase | Embeddings of the entities | > 15GB | Download |
Embeddings of the relations | < 10MB | ||
List of the entity ids | 1.5GB | ||
List of the relation ids | < 1MB | ||
List of the triple ids | 6GB | ||
XLORE | Embeddings of the entities | < 4GB | Download |
Embeddings of the relations | < 60MB | ||
List of the entity ids | < 500MB | ||
List of the relation ids | < 2MB | ||
List of the triple ids | < 1GB |
How to read the binary files:
#Python codes to read the binary files.
import numpy as np
vec = np.memmap(filename , dtype='float32', mode='r')
//C(C++) codes to read the binary files.
#include <cstring>
#include <cstdio>
#include <cstdlib>
#include <sys/mman.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/stat.h>
struct stat statbuf;
int fd;
float* vec;
int main() {
if(stat(filename, &statbuf)!=-1) {
fd = open("relation2vec.bin", O_RDONLY);
vec = (float*)mmap(NULL, statbuf.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
}
return 0;
}
More information: