The basic idea is to replace the application user with a robot, ie. a computer program which mimics user behaviour. Of course, this is tricky as it poses several limitations on the obtained traffic files. For example, the statistical features of the traffic flows will be affected and should not be used as a feature for traffic classification.
However, several other kind of features may be usable for classification, probably. For example, protocol format and side-channel communications, like DNS.
A generic system for automatic trace generation requires two key elements:
For (1), the Mutrics project developed a specialized single-application sniffer: tracedump. Basically, the program attaches to given Linux process and writes all of its IP packets on disk.
For (2), several possible solutions exist, see Wikipedia List of GUI testing tools for a quick reference.
Once the target application is chosen, a special robot (implementing a user model) is programmed using the GUI automation system and run repeatedly, so the application generates IP traffic. At the same time the sniffer captures all of the traffic data.
In case the resulting PCAP file can not be published, the script written in GUI automation system can be published instead, so a similar IP traffic trace file can be generated once again.
Below you can find examples of applications to Google Maps, Google Docs and Google Spreadsheets. The Google Docs examples are particularly hard ones, because there is little side-channel and protocol format information available.