Entity SEO and Semantic Publishing
The Entities’ Swissknife: the app that makes your job easier
The Entities’ Swissknife is an app developed in python and entirely dedicated to Entity SEO and Semantic Publishing, supporting on-page optimization around entities recognized by Google NLP API or TextRazor API. In addition to Entity extraction, The Entities’ Swissknife allows Entity Linking by automatically generating the necessary Schema Markup to make explicit to search engines which entities the content of our web page refers to.
The Entities’ Swissknife can help you to:
- know how NLU (Natural Language Understanding) algorithms “understand” your text so you can optimize it until the topics that are most important to you have the best relevance/salience score;
- analyze your competitors’ pages in SERPs to discover possible gaps in your content;
- generate the semantic markup in JSON-LD to be injected in the schema of your page to make explicit to search engines what topics your page is about;
- analyze short texts such as copy an ad or a bio/description for an about page. You can fine-tune the text until Google recognizes with sufficient confidence the entities that are relevant to you and assign them the correct salience score.
Written by Massimiliano Geraci for Studio Makoto, The Entities’ Swissknife has been publicly released on Streamlit, a platform that since 2020 has guaranteed itself a respectable place among data scientists using Python.
It may be helpful to clarify what is meant by Entity SEO, Semantic Publishing, Schema Markup, and then dive into using The Entities’ Swissknife.
- The Entities’ Swissknife: the app that makes your job easier
- The Entities’ Swissknife can help you to:
- Entity SEO
- Semantic publishing
- Differences between a Lexical Search Engine and a Semantic Search Engine
- Topic Modeling and Content Modeling
- Entity linking / Wikification
- The “about,” “mentions,” and “sameAs” properties of the markup schema
- How to correctly use the properties about and mentions
- How to Use The Entities’ Swissknife
- When to choose TextRazor APIs or Google NLP APIs
- Copy Sandbox
- Other options
- Calculation of entity frequency and possible fallbacks
Entity SEO is the on-page optimization activity that considers not the keywords but the entities (or sub-topics) that constitute the page’s topic.
The watershed that marks the birth of the Entity SEO is represented by the article published in the official Google Blog, which announces the creation of its Knowledge Graph.
The famous title “from strings to things” clearly expresses what would have been the primary trend in Search in the years to come at Mountain view.
To understand and simplify things, we can say that “things” is more or less a synonym for “entity.”
In general, entities are objects or concepts that can be uniquely identified, often people, places, things, and things.
It is easier to understand what an entity is by referring to Topics, a term Google prefers to use in its communications for a broader audience.
On closer inspection, topics are semantically broader than things. In turn, the things – the things – that belong to a topic, and contribute to defining it, are entities.
Therefore, to quote my dear professor Umberto Eco, an entity is any concept or object belonging to the world or one of the many “possible worlds” (literary or fantasy worlds).
Semantic Publishing is the activity of publishing a page on the Internet to which a layer is added, a semantic layer in the form of structured data that describes the page itself. Semantic Publishing helps search engines, voice assistants, or other intelligent agents understand the page’s meaning, context, and structure, making information retrieval and data integration more efficient.
Semantic Publishing relies on adopting structured data and linking the entities covered in a document to the same entities in various public databases.
As it appears printed on the screen, a web page contains information in an unstructured or poorly structured format (e.g., the division of paragraphs and sub-paragraphs) designed to be understood by humans.
Differences between a Lexical Search Engine and a Semantic Search Engine
While a traditional lexical search engine is roughly based on matching keywords, i.e., simple text strings, a Semantic Search Engine can “understand” – or at least try to – the meaning of words, their semantic correlation, the context in which they are inserted within a document or a query, thus achieving a more precise understanding of the user’s search intent in order to generate more relevant results.
A Semantic Search Engine owes these capabilities to NLU algorithms, Natural Language Understanding, as well as the presence of structured data.
Topic Modeling and Content Modeling
The mapping of the discrete units of content (Content Modeling) to which I referred can be usefully carried out in the design phase and can be related to the map of topics treated or treated (Topic Modeling) and to the structured data that expresses both.
It is a fascinating practice (let me know on Twitter or LinkedIn if you would like me to write about it or make an ad hoc video) that allows you to design a site and develop its content for an exhaustive treatment of a topic to acquire topical authority.
Topical Authority can be described as “depth of expertise” as perceived by search engines. In the eyes of Search Engines, you can become an authoritative source of information concerning that network of (Semantic) entities that define the topic by consistently writing original high-quality, comprehensive content that covers your broad topic.
Entity linking / Wikification
Entity Linking is the process of identifying entities in a text document and relating these entities to their unique identifiers in a Knowledge Base.
Wikification occurs when the entities in the text are mapped to the entities in the Wikimedia Foundation resources, Wikipedia and Wikidata.
The Entities’ Swissknife helps you structure your content and make it easier for search engines to understand by extracting the entities in the text that are then wikified.
If you select the Google NLP API, entity linking will also occur to the corresponding entities in the Google Knowledge Graph.
The “about,” “mentions,” and “sameAs” properties of the markup schema
Entities can be injected into semantic markup to explicitly state that our document is about some specific place, product, brand, concept, or object.
The schema vocabulary properties that are used for Semantic Publishing and that act as a bridge between structured data and Entity SEO are the “about,” “mentions,” and “sameAs” properties.
These properties are as powerful as they are unfortunately underutilized by SEOs, especially by those who use structured data for the sole purpose of being able to obtain Rich Results (FAQs, review stars, product features, videos, internal site search, etc.) created by Google both to improve the appearance and functionality of the SERP but also to incentivize the adoption of this standard.
Declare your document’s primary topic/entity (web page) with the about property.
Instead, use the mentions property to declare secondary topics, even for disambiguation purposes.
How to correctly use the properties about and mentions
The about property should refer to 1-2 entities at most, and these entities should be present in the H1 title.
Mentions should be no more than 3-5, depending on the article’s length. As a general rule, an entity (or sub-topic) should be explicitly mentioned in the markup schema if there is a paragraph, or a sufficiently significant portion, of the document devoted to the entity. Such “mentioned” entities should also be present in the relevant headline, H2 or later.
Once you have selected the entities to use as the values of the mentions and about properties, The Entities’ Swissknife performs Entity-Linking, via the sameAs property and generates the markup schema to nest into the one you have created for your page.
How to Use The Entities’ Swissknife
You must enter your TextRazor API keyword or upload the credentials (the JSON file) related to the Google NLP API.
To get the API keys, sign up for a complimentary subscription to the TextRazor website or the Google Cloud Console [following these simple instructions].
Both APIs provide a free daily “call” fee, which is more than enough for personal use.
In the current online version, you don’t need to enter any key because I decided to allow the use of my API (keys are entered as secrets on Streamlit) as long as I don’t exceed my daily quota, take advantage of it!
When to choose TextRazor APIs or Google NLP APIs
From the right sidebar, you can select whether to use the TextRazor API or the Google NLP API from the respective dropdown menus. Moreover, you can decide if the input will be a URL or a text.
I prefer to use the TextRazor API to inject entities into structured data and then for absolute Semantic Publishing. These APIs extract both the URI of the relative page on Wikipedia and the ID (the Q) of the entries on Wikidata.
If you are interested in adding, as property sameAs of your schema markup, the Knowledge Panel URL related to the entity must be made explicit, starting from the entity ID within the Google Knowledge Graph, then you will need to use the Google API.
If you want to use The Entities’ Swissknife as a copy sandbox, i.e., you want to test how a sales copy or a product description, or your biography in your Entity home is understood, then it is better to use Google’s API since it is by it that our copy will have to be understood.
You can only extract entities from meta_title, meta_description, and headline1-4.
By default, The Entities’ Swissknife, which uses Wikipedia’s public API to scrap entity definitions, is limited to save time, to only selected entities as about and mentions values. However, you can check the option to scrap the descriptions of all extracted entities and not just the selected ones.
If you choose the TextRazor API, there is the possibility to extract also Categories and Topics of the document according to the media topics taxonomies of more than 1200 terms, curated by IPCT.
Calculation of entity frequency and possible fallbacks
The count of occurrences of each entity is shown in the table, and a specific table is reserved for the top 10 most frequent entities.
Although a stemmer (Snowball library) has been implemented to disregard the masculine/feminine and singular/plural forms, the entity frequency count refers to the so-called “normalized” entities and not to the strings, the exact words with which the entities are expressed in the text.
For example, if in the text it is present the word SEO, the corresponding normalized entity is “Search Engine Optimization,” and the frequency of the entity in the text could result falsified, or also 0, in the case in which the text, the entity is always expressed through the string/keyword SEO. The old keywords are nothing else than the strings through which the entities are expressed.
In conclusion, The Entities’ Swissknife is a powerful tool that can help you improve your search engine rankings through semantic publishing and entity linking that make your site search engine friendly.
For any comment or clarification, I am available on Twitter @max_geraci.