iCoSys is proud to present SwissCrawl, the largest Swiss German text corpus to date!
The tool was built by Lucy Linder with the supervision of Jean Hennebert & Andreas Fischer and is composed of more than half a million sentences, which were generated using a customized web scraping tool that could be applied to other low-resource languages as well.
Want to inspect the code ? Click here Want to know a bit more about the proceedings? Read the arXiv paper here And/or read the LREC2020 paper here SwissCrawl is under Creative Commons CC BY-NC 4.0 and is free for non-commercial use only. Go check on iCoSys