Is machine learning a buzzword?

Proof of Concept

We used our court rulings in BeckRS as a proof of concept. In the inventory data, all documents are already provided with keywords. Supervised learning is ideal here, especially artificial neural networks. The input data (raw documents) are shown to the so-called "model" and the desired output data (keywords) are expected. Mainly Tensorflow, a beginner-friendly open source library from Google, was used.

In detail, there was also some preprocessing: there Neural Networks initially only had to understand numbers, the texts initially had to use the techniques of Natural Language Processing are recorded. The software used includes Natural Language Toolkit, fastText and Solr.

After a time-consuming learning process, the model is ready for quick predictions.

Result

Tests result with a certainty of >95% the correct keywords. For example, the human keywords "condominium" and "impact sound insulation" were assigned to a document and the forecast provided:

99%Impact sound insulation
99%Dream condo
97%Impact sound insulation
96%Elimination of defects
96%purpose
96%Condominium
95%Thing
95%senate
95%Job
95%Law

 

General places such as "right" are then filtered out in a post-processing step.

Conclusion

In the end it got us machine learning convinced. Especially with a lot of existing data, monitored learning delivers satisfactory results, even if a little effort has to be put into the pre- and post-processing.