Stackexchange's data is a jumble of questions and answers, where the answers refer to the questions with a parentId field. e.g. ```xml ``` Since the search engine wants to extract keywords for each thread holistically, not by question or answer, it is necessary to re-arrange the data (which is very large). SQLite does a decent job of enabling this task. See [tools/stackexchange-converter](../../tools/stackexchange-converter).