Since February, I’ve been wondering about how far ORCID has drifted from its role as a reliable ID system. It started when I found scads of fake accounts in its system, presumably deposited to boost the SEO of various commercial ventures (some illicit) for things like cryptocurrency, blockchain, drugs, and porn.
Some of these accounts appear to have been removed since February (but Princess Leia Lucas remains), while other bogus IDs are gradually filtering in.
This all suggested that ORCID is far less robust and trustworthy than advertised, and their data seemed to confirm this.
To check how things have evolved since the start of 2021, I ran two searches yesterday — one within ORCID for “modified 2021,” which would tend to surface records created or modified this year; and another of ORCID via Google’s site search for the timeframe January 1-April 25, 2021.
The Google search returned far fewer results than the ORCID search.
Based on this and other searches, it appears Google strips out the most ORCID records, displaying only ~35 records for 2021, for instance. In this fetch, there were no bogus records I could identify. They all looked like real people putting in place real records. The records weren’t all robust — many ORCID records are not — but they also weren’t obviously fake, and the best ones were on Page 1, as expected.
But Google is overdoing it, taking out all the entries with “porn” in them, for example, despite this sometimes being a valid surname.
By contrast, searches for records created or modified in 2021 via ORCID itself generated — just on the first page of results — an ORCID for an OpenCart app in Turkey, a slot machine’s ORCID, a dishwasher’s ORCID (the device, I gather from the context), and an ORCID advertising a way to stream the April 21st Paul Gallen vs Lucas Browne boxing match for free.
The ratio of obviously bogus accounts to possibly valid accounts seems to be about 50:50.
This may be why Google is so scant in its results from ORCID — because their technologists and algorithms have determined ORCID is unreliable, so they have tightened the filters around what their index ingests and processes.
The elimination of millions of records from ORCID by Google suggests that attempts to use ORCID to boost SEO are largely ineffective, as Google filters them out. It also shows that keeping bogus accounts out is technically feasible — even in a recursive manner. Finally, it probably means ORCID is largely shut out of the broader discovery ecosystem. For instance, a similar search in Bing was even more sparse in its results, with a link noting that some results had been removed, with a link to a page where the relevant details appear to involve “Quality, safety, and user demand.”
If only ORCID were capable of keeping its own house clean.
Is that really too much to ask from an organization with these four core strategies (“Trusted Assertions” being the most relevant to this discussion)?
It looks like the search engines don’t trust ORCID, either.