🛣[Deep Learning]Stanford CS224w:Machine Learning with Graphs
想说的话🎇
🔝课程网站:http://web.stanford.edu/class/cs224w/
👀一些资源: B站精讲:https://www.bilibili.com/video/BV1pR4y1S7GA/?spm_id_from=333.337.search-card.all.click&vd_source=280e4970f2995a05fdeab972a42bfdd0
https://github.com/TommyZihao/zihao_course/tree/main/CS224W
Slides: http://web.stanford.edu/class/cs224w/slides
Reasoning over KGs
Goal: How to perform multi-hop reasoning over KGs?
- Path Querise
An N-hop path query q can be represented by
Question: What proteins are associated with adverse events caused by Fulvestrant?
-
\(v_a\) is :
e:Fulvestrant
-
\((r_1,r_2)\) is
(r:Causes, r:Assoc)
-
Query:
(e:Fulvestrant,(r:Causes,r:Assoc))
Because the completed(probabilistic) KGs is a dense graph, we need a way to answer path-based queries over an incomplete knowledge graph
Task: Predictive queries
- Want to be able to answer arbitrary queries while implicitly imputing for the missing information
Key Idea:Embed queries
-
Generalize TransE to multi-hop reasoning.
-
Query embedding: \(q = h + r\)
-
Goal: query embedding \(q\) should be close to the answer embedding \(t\)
Since TransE can naturally handle compositional relations, it can handle path queries by translating in the latent space for multiple hops using addition of relation embeddings.(DistMult / ComplEx can't)
- Conjunctive Queries
Conjunctive Queries: What are drugs that cause Short of Breath and treat diseases associated with protein ESR2?
(e:ESR2, (r:Assoc, r:TreatedBy)), (e:Short of Breath, (r:CausedBy))
How can we use embeddings to implicitly impute the missing edges?
- Query2Box
Projection Operator \(\mathcal{P}\):
Intuition:Take the current box as input and use the relation embedding to project and expand the box
How do we take intersection of boxes?
How do we define the score function \(f_q(v)\) (negative distance) ?
where \(0< \alpha <1\)
Intuition: if the point is enclosed in the box, the distance should be downweighted.
\(\alpha\)让应该在box内部的点更加靠近box中心,增强鲁棒性。
AND-OR queries (union operation)
E.g.: What drug can treat breast cancer or lung cancer?
AND-OR queries: Conjunctive queries + disjunction, called Existential Positive First-order (EPFO) queries.
问题:在box的表示形式下,使用union操作的话,那么我们的嵌入空间维度会很大,节点越多维度越大。而在知识图谱中,节点非常多。
因此我们很难在低维空间中嵌入AND-OR查询
Key idea: take all unions out and only do union at the last step
Logically, any AND-OR query can be expressed as a disjunction of conjunctive queries.
Distance between entity embedding and a DNF \(q=q_1 \cup q_2 \cup... \cup q_m\) is defined as:
Training Query2Box
-
Training:
-
Smaple a query \(q\) from the training graph \(G_{train}\), answer \(v \in [q]_{G_{train}}\),and a negative sample \(v' \notin [q]_{G_{train}}\)
Negative sample: Entity of same type as \(v\) but not answer to \(q\)
-
Embed the query \(\mathbf{q}\)
-
Calculate the score \(f_q(v)\) and \(f_q(v')\).
-
Optimize the loss \(\mathcal{l}\) to maximize \(f_q(v)\) while minimize \(f_q(v')\).
Query Template
A Simple Example
We use t-SNE to reduce the dimension of the embedding space to 2D for visualization.
TP (True Positive):实际为正,预测也为正。TN (True Negative):实际为负,预测也为负。FP (False Positive):实际为负,但预测为正。也被称为 I 类错误或“假阳性”。FN (False Negative):实际为正,但预测为负。也被称为 II 类错误或“假阴性”。