First I will discuss a preprocessing tool that selects parallel sentence pairs that are suitable for comparative syntactic research, filtering out sentence pairs that are syntactically too different. Results were obtained through experiments on Dutch, German and English, and suggest a graph edit distance on parse trees yields the best results.
I will furthermore discuss recent results in extracting syntactic differences from parallel corpora. We build on Wiersma et al.'s (2011) method, and apply the Minimal Description Length Principle in the task. After mining for characteristic part-of-speech patterns by compressing the data, we extract differences in distribution of found patterns between languages. Results were obtained through experiments on Dutch, English and Czech, and show useful and meaningful differences, which can guide linguists in their comparative syntactic research.