Instructions: Frequency list

The Google Web 1T 5-Gram Database is a collection of frequent 5-grams extracted from approximately 1 trillion words of Web text collected by Google Research. This Web interface allows you to run interactive queries on an indexed version of the database and displays the most frequent N-grams matching a specified search pattern. If you want to rank matches by their association strength instead, click the Associations tab at the top of this page. For the Web interface, case-folding and some additional normalization of the N-grams have been performed, so the frequency counts may occasionally be different from those found in the original Google data. The normalized N-grams have been indexed in several SQLite databases with a total size of 180 gigabytes. For any further questions or bug reports, please contact Stefan Evert.

Search pattern

The search pattern consists of up to 5 terms, which represent the elements of an N-gram and must be separated by blanks. Unigram queries are currently not allowed, i.e. you have to specify at least 2 terms. Our database engine supports five different types of search terms:

Push the Search button to execute your query, Help to display this help page, or Reset Form to start over from scratch. The CSV button returns a CSV table suitable for import into a spreadsheet program or database. The XML button returns the search results in an XML format, allowing this interface to be used as a Web service.


You can customise the display format of search results with the option menus below the search pattern:


The examples below include comments starting with //, which must not be entered in the search pattern field.

interesting *             // what are people most interested in?

* violin                  // '*' at the start of a query is much slower

met ? * [man,woman]       // use '?' to skip determiner etc.

[enjoy,enjoys] ? *        // what do people enjoy? (use "collapsed" display)

%ization ? * health       // use with "grouped" display

from * to *               // a classic of Googleology

antidisestablishmentarianism ?  // a trick to obtain unigram frequencies