Bouraoui et al. (2020) Inducing Relational Knowledge from bert they distill relational

Download 77,83 Kb.

Bog'liq
Caerdydd notes

Bouraoui et al. (2020) - Inducing Relational Knowledge from BERT - They distill relational

knowledge from a PLM by using sentences with related terms as templates. At one point they say,

“even if language models capture relational knowledge, it is important to find the right sentences

to extract that knowledge. I’m not sure I agree that minor perturbations in prompts resulting in a

collapse in the facade of knowledge are merely an obstacle to overcome. I think that it is rather

indicative of the PLM not actually possessing relational knowledge, but rather possessing some sort

of vague association between words in a sentence. As in capital-of seems easy for PLMs, but

that is likely because they are often explicitly expressed either as X is the capital of Y or just

as X, Y or the like, i.e. they occur often together. This is corroborated by the is-colour relations

never working. A trait often not specified unless it is a trait specific to a given instance of an

item. Actually, the use of these templates is cart before horse? As in, using Paris is located in

central France and replacing Paris with London and France with England would be a better test

that the PLM has actually learnt relational knowledge, i.e. you would want London is the capital

of England to score high and London is located in central England to score low, but if it has

just learn to associate London and England, then you’d expect similar scores.

They then train a classifier (also updating the weights of bert) which is trained to predict whether

an assertion is true or not. They create negative samples by scrambling real samples and by reversing

them. These seems awfully circular. You used bert to come up with templates and then you use

these templates to test if bert has learnt anything about these relations by training bert on these

templates. A strong hmmm. I also wonder if using a classifier to predict whether a pair is related

based on a given template can only give a shallow indication of relational knowledge being encoded

in a PLM. Maybe it would be better to try and predict the template given the pair? I.e.

Paris is the [MASK] of France. Might work for BERT but not for other PLMs?

This shows that the BERT language model indeed captures commonsense and factual

knowledge to a greater extent than word vectors, and that such knowledge can be extracted

from these models in a fully automated way.

I’m not sure I agree with this conclusion. In contexts where relational information is encoded, a

BERT can be fine-tuned to predict semantic relations based on these prompts.

References

Bouraoui, Z., Camacho-Collados, J., and Schockaert, S. 2020. Inducing relational knowl-

edge from bert. Proceedings of the AAAI Conference on Artificial Intelligence 34, 5, 7456–7463.

1

Download 77,83 Kb.

Do'stlaringiz bilan baham: