Cannabis Sativa

This is a proposed guideline on how to systematically use bots and humans, working together, to ensure Wikipedia's non-free images are brought into line with, and remain in compliance with Wikipedia:Non-free content criteria. This is intended to document current and past procedures, as well as include suggestions for future improvements.

Background[edit]

Early history[edit]

Brief account of pre-2007 stuff, which is in itself not all the same. Things varied markedly from year to year from 2001 to 2006 by all accounts. The requirement for a rationale has always been part of non-free content policy going back to the earliest versions. Lacking a rationale explicitly became a criterion for speedy deletion on 4 May 2006. The fair use policies date back to 2004, and were rewritten and renamed "non-free content" in April 2007. An example of early fair-use policy from September 2004 can be seen at this version of Wikipedia:Image description page (redirected in April 2006 to Help:Image page).

Wikimedia Foundation resolution[edit]

Non-free content policy[edit]

Originally called "Wikipedia:Fair use" and "Wikipedia:Fair use criteria", these were renamed "Non-free content" in 16 April 2007.

This has been adopted as the authoritative Exemption Doctrine Policy (EDP) of English Wikipedia.

Volume of images[edit]

One of the main problems with image uploads is the sheer volume of uploads compared to the number of editors prepared to do quality work in the image namespace. Compared to editors working on text contributions (though even there, bots help identify copyright violations), the number of editors actively working on images and image policy compliance is relatively small compared to the number of images uploaded (and especially of images uploaded by those who actively ignore policy). In addition, text contributions are fluid and editable when compared to image contributions, which tend to be discrete, single-upload contributions, either added to or removed from an article. It is the volume of uploads of non-free images (or indeed any images without sources) that requires bots to be used as 'gatekeepers'.

Some stats on the volume of images over time. See Special:Statistics for the latest media files total. No data supplied or found yet for each year.

  • 2001
  • 2002
  • 2003
  • 2004
  • 2005
  • 2006
  • 2007
  • 2008

As of 17 February 2008, Wikipedia has 773,658 media files (this includes video clips and sound clips, as well as images). That figure includes around 300,000 non-free images (estimated figure). Others have quoted a figure of around 50% for non-free images compared to total number of images. This can be compared to the total number of articles: 2,235,536 as of 17 February 2008. Another interesting statistic would be the number of articles using non-free media files (some articles use none, some use more than one). It would also be interesting to know how many images from Commons (these are free ones) are used on Wikipedia. There will be some overlap between the free images on Wikipedia and those on Commons, as there is not 100% separation yet. Although not all the files on Commons are used on en-Wikipedia, there are 2,472,664 media files on Commons (as of 17 February 2008).

Other statistics[edit]

  • "Update: Phase 1: 5609||Phase 2: 143494||Phase 3: 19897||Total: 169000" – Betacommand – 01:17, 14 June 2007 (UTC)
  • "Update: Phase 1: 3670||Phase 2: 134616||Phase 3: 18918||Total: 157204" – Betacommand – 00:09, 26 July 2007 (UTC)

Weekly uploads and deletions and bot taggings[edit]

Provided by ST47:[1].

See User:BetacommandBot (operated by Betacommand) and User:ImageTaggingBot (operated by Carnildo)

+--------+---------+---------+------------+--------+----------------+-----------------+
| date   | uploads | deletes | net_change | STBotI | BetacommandBot | ImageTaggingBot |
+--------+---------+---------+------------+--------+----------------+-----------------+
| 200653 |   15430 |    9668 |       5762 |   NULL |           NULL |            NULL | 
| 200701 |   16806 |    6582 |      10224 |   NULL |           NULL |            NULL | 
| 200702 |   19384 |    6691 |      12693 |   NULL |            442 |            NULL | 
| 200703 |   19027 |    9415 |       9612 |   NULL |            844 |            NULL | 
| 200704 |   20418 |   10131 |      10287 |   NULL |           6730 |            NULL | 
| 200705 |   18804 |   27043 |      -8239 |   NULL |           3299 |            NULL | 
| 200706 |   18204 |   10096 |       8108 |   NULL |           3217 |            NULL | 
| 200707 |   18672 |   10308 |       8364 |   NULL |           1997 |            NULL | 
| 200708 |   18940 |   10477 |       8463 |   NULL |           3122 |            NULL | 
| 200709 |   18868 |   11518 |       7350 |   NULL |           1644 |            NULL | 
| 200710 |   19119 |    7684 |      11435 |   NULL |           1774 |            NULL | 
| 200711 |   19470 |    7527 |      11943 |   NULL |           1163 |            NULL | 
| 200712 |   19011 |   11122 |       7889 |   NULL |            387 |            NULL | 
| 200713 |   18766 |    9469 |       9297 |   NULL |            653 |            NULL | 
| 200714 |   19975 |   11849 |       8126 |   NULL |           2972 |            NULL | 
| 200715 |   19276 |   10312 |       8964 |   NULL |           1407 |            NULL | 
| 200716 |   18532 |   11924 |       6608 |   NULL |           2956 |            NULL | 
| 200717 |   18456 |   13181 |       5275 |   NULL |            111 |            NULL | 
| 200718 |   17389 |   10570 |       6819 |   NULL |           5831 |            NULL | 
| 200719 |   17170 |   30934 |     -13764 |   NULL |           3805 |            NULL | 
| 200720 |   18253 |   13030 |       5223 |   NULL |            587 |            NULL | 
| 200721 |   18168 |   11972 |       6196 |   NULL |          22225 |            NULL | 
| 200722 |   19529 |   19790 |       -261 |   NULL |          62662 |            NULL | 
| 200723 |   17637 |    9196 |       8441 |   NULL |           2451 |            NULL | 
| 200724 |   16805 |   13605 |       3200 |   NULL |          13403 |            NULL | 
| 200725 |   17398 |   19702 |      -2304 |   NULL |           4392 |            NULL | 
| 200726 |   17731 |   16973 |        758 |   NULL |          12172 |            NULL | 
| 200727 |   18195 |   23334 |      -5139 |   NULL |           8473 |            NULL | 
| 200728 |   17544 |   16781 |        763 |   NULL |           9015 |            NULL | 
| 200729 |   17113 |   21325 |      -4212 |   NULL |           5464 |            NULL | 
| 200730 |   17632 |   20553 |      -2921 |     32 |           6331 |            NULL | 
| 200731 |   15914 |   12006 |       3908 |   2873 |           4632 |            NULL | 
| 200732 |   16492 |   10832 |       5660 |   3958 |           1364 |            NULL | 
| 200733 |   17258 |    9891 |       7367 |   2452 |           2783 |            NULL | 
| 200734 |   13122 |   10716 |       2406 |   NULL |          12550 |            NULL | 
| 200735 |   13843 |    7803 |       6040 |   NULL |           5308 |            NULL | 
| 200736 |   12876 |    9269 |       3607 |    994 |          14208 |            NULL | 
| 200737 |   12197 |   18171 |      -5974 |   2837 |           9523 |            NULL | 
| 200738 |   11993 |    9864 |       2129 |    491 |          10063 |            NULL | 
| 200739 |   12993 |   12442 |        551 |   3595 |           8437 |            NULL | 
| 200740 |   11960 |    9683 |       2277 |   4369 |           4921 |            NULL | 
| 200741 |   12612 |   15307 |      -2695 |   2278 |           4050 |            NULL | 
| 200742 |   12511 |   10498 |       2013 |   2683 |          31843 |            NULL | 
| 200743 |   12911 |   17762 |      -4851 |    864 |          10358 |            NULL | 
| 200744 |   18094 |   16616 |       1478 |    302 |          49991 |            NULL | 
| 200745 |   11421 |   37841 |     -26420 |   1003 |          15117 |            NULL | 
| 200746 |   11824 |    9134 |       2690 |    388 |           3840 |              20 | 
| 200747 |   12180 |    5378 |       6802 |   2069 |          19049 |            1145 | 
| 200748 |   10898 |   12379 |      -1481 |   2604 |           9599 |             716 | 
| 200749 |   11124 |   13256 |      -2132 |   2232 |           7292 |             854 | 
| 200750 |   17773 |   13009 |       4764 |   1166 |          12370 |            1478 | 
| 200751 |   12634 |    8681 |       3953 |   1766 |           4878 |             940 | 
| 200752 |   12367 |    8348 |       4019 |   4381 |          57858 |             613 | 
| 200801 |   12283 |   13124 |       -841 |   2865 |           1960 |            1119 | 
| 200802 |   13864 |   11159 |       2705 |   2506 |           5982 |             855 | 
| 200803 |   12623 |   11855 |        768 |   2854 |          31301 |             719 | 
| 200804 |   11495 |    7652 |       3843 |   3559 |            164 |             487 | 
| 200805 |   11005 |    7356 |       3649 |   5867 |            293 |             133 | 
| 200806 |   11323 |   11318 |          5 |   5437 |          76583 |            1056 | 
| 200807 |    1255 |     687 |        568 |    362 |            900 |             201 | 
+--------+---------+---------+------------+--------+----------------+-----------------+
Image uploads over 58 weeks from January 2007
Image deletions over 58 weeks from January 2007
Image deletions and uploads over 58 weeks from January 2007
Images net change over 58 weeks from January 2007
Image tagging by BetacommandBot over 58 weeks from January 2007
Image tagging and image deletion spikes over 58 weeks from January 2007

Conclusions[edit]

  • The totals include both free and non-free images, which skews the picture in several ways.
  • The rate of uploads remained fairly steady in the short-term, but declined steadily over 2007 from around 20,000 to around 10,000. It is estimated that the number of images uploaded with a non-free tag accounts for about a quarter of the uploads.
  • The rate of image deletions fluctuated much more dramatically, as can be seen by the third graph. There are a variety of reasons for deletions, and various bot deletions that could account for this (most probably User:MiszaBot), but there are also deletions due to transfer to Commons, and speedy deletions at user request and suchlike.
  • Overall, the number of images increased steadily over the year, but the increase does seem to be tailing off.
  • BetacommandBot's tagging was sporadic and irregular in quantity, partly due to the deferral of tagging of legacy images until the start of 2008, which accounts for the increased tagging in the first few weeks of 2008.
  • The final graph shows that one of the three deletion spikes followed a spike of tagging by BetacommandBot. Other deletion spikes may have been due to deletions by User:MiszaBot (not included in the above data) or other factors unrelated to bot tagging. More data collection and analysis is needed to confirm this.
  • One overall conclusion is that much more work needs to be done to check and verify the "free" images being uploaded. An over-emphasis on non-free images may have led to less focus on checking the validity of images incorrectly labelled "free" (ie. copyvios).

More data[edit]

Compare 22 September 2007 [320,733 images] (note the need to remove the duplicate line for non-free promotional) with 12 March 2008 [280,284 images] – a decrease of 40,449 non-free images.

Ensuring compliance[edit]

The problem we have is the following: "By March 23, 2008, all existing files under an unacceptable license as per the above must either be accepted under an EDP [Exemption Doctrine Policy], or shall be deleted."the way the EDP is worded on en-Wikipedia, we cannot tell which of our images are compliant or not. In fact, the compliance varies according to the use of the image. ie. take an image and overuse it in lots of articles and suddenly the image becomes non-compliant and eligible for deletion (in reality, the solution is to take the image out of the articles it shouldn't be in). In practice, the machine-readable parts of the EDP are the license tag [10a] (we are fairly good on that now, as all the non-free tags have "non-free" in their titles, and image without license tags are routinely deleted), the source (we need to develop more widespread and rigorous use of a source template [10b], like {{information}}, which is now being used on Commons and here), and we are gradually moving towards having a majority (though not all) of non-free images using some form of "rationale" template [10c]. Criteria 7 (use in an article – i.e., no orphaned non-free images), and 9 (inappropriate locations such as the wrong namespace) can largely be assessed by bots. A criterion that needs a combination of humans and bots assessing whether humans are correctly filling in a template is criterion 4 (previous publication) – we need to develop a template field that allows people to specify where a non-free image was previously published, and get a bot to demand compliance there. Criteria 1, 2, 3, 5, 6, and 8 (replaceable, commercial opportunities, minimal use, encyclopedic nature, image policy and significance) all need human input to decide whether an image is compliant. There. That gives an idea of how far we have come and how far we still have to go. My view is that we aren't even at a stage where we can reliably say whether any images are 100% compliant, but we are making progress.

Accordingly, I propose that 13 subpages be set up here to address the compliance issues for all 10 of the Non-free content criteria (NFCC) and their subcriteria, and to discuss how best to use bots and humans (working together) to ensure compliance and come up with a workable system that will slowly but surely improve the image documentation and compliance on en-Wikipedia.

The following are labelled either as HUMAN or BOT (or both) depending on who can carry out the checks required. Generally agreed principles and simple checks should be just that, simple. Case-by-case disucssions are more complex, and may require specialist input and copyright discussions, at WP:IFD or elsewhere.

  • /1WP:NFCC#1 – replaceable (must be replaced if possible)
    • HUMAN (case-by-case discussion)
  • /2WP:NFCC#2 – commercial opportunities (our use mustn't compete with commercial use)
    • HUMAN (case-by-case discussion)
  • /3
    • /3aWP:NFCC#3a – minimal use (low resolutions and reduce non-free use per article and in Wikipedia as a whole)
      • HUMAN + BOT (algorithm and generally agreed principles)
    • /3bWP:NFCC#3b – minimal extent of use (don't overuse individual non-free images)
      • HUMAN + BOT (algorithm and generally agreed principles)
  • /4WP:NFCC#4 – previous publication (must have been previously published)
    • HUMAN (simple check of the details provided)
  • /5WP:NFCC#5 – content (generally encyclopedic)
    • HUMAN (simple check against examples of non-encyclopedic content)
  • /6WP:NFCC#6 – image policy (complies with Wikipedia:Image use policy)
    • HUMAN (simple check against examples of non-NFC violations of image policy)
  • /7WP:NFCC#7 – not orphaned (all non-free images used in at least one article)
    • BOT (purely mechanical check – humans needed to guard against vandalism)
  • /8WP:NFCC#8 – significance (non-free images must contribute significantly to the article)
    • HUMAN (case-by-case discussion)
  • /9WP:NFCC#9 – location (with limited exceptions, non-free images only used in article namespace)
    • BOT (purely mechanical check)
  • /10
    • /10aWP:NFCC#10a – source (attribute the original source and copyright owner and intermediate sources)
      • HUMAN (simple check of the details provided)
    • /10bWP:NFCC#10b – copyright tag (use one of the copyright tags suggested when uploading)
      • BOT (purely mechanical check against approved list of copyright tags)
    • /10cWP:NFCC#10c – rationale (explain the reason for using the non-free image and name the article it is used in)
      • HUMAN + BOT (algorithm detects existence of rationale and article name and overuse of images outside articles mentioned in rationales – humans check whether the rationale or overuse is valid or not, remembering that some of the components of a good rationale directly or indirectly address other aspect of the NFCC, specifically (eg. in Template:Non-free use rationale) source [#10a], description [#5 and #6], portion [#2 and #3a], resolution, [#2 and #3a], purpose of use [#5 and #8] and replaceability [#1])

Machine-readable results[edit]

Ideally, once a human or bot has checked an image for compliance with the ten criteria listed above, they will edit the image page to indicate that this has been done, and periodic sweeps with a bot will indicate how many images have been checked for each criterion, and how many remain to be checked.

Bot-generated lists[edit]

Humans can check bot-generated lists and feed the results back to bot operators who will use the bots to update the machine-readable parts of the image page. This is an ideal way for bots and humans to work together.

Poor compliance coverage[edit]

Compliance checking is generally poor for WP:NFCC#4 – previous publication. Many images do not explicitly state where the image was previously published or first published (sometimes important for considerations of copyright expiration), though reference to the source is sometimes an implicit pointer to previous publication.

Work schedule[edit]

Bots[edit]

State here what bots and similar tools can usefully do, and what their limitations are.

Humans[edit]

State here what humans can usefully do, and what their limitations are.

The main limitations are that they are often intrinsically lazy (will defer fixing things until they are forced to), they are prone to error and misunderstandings (will sometimes just get things wrong), and are also prone to failures of comprehension (will read a policy and not understand it) and incompetence (writing a bad rationale). They can also be dishonest (deliberately uploading copyright violating images) and egocentric (insisting they are right) and can get cranky and frustrated (shouting at others for getting things wrong). On the plus side, they can understand things that a bot never will be able to, they can be kind and responsive to others (telling them how they should be doing things), they can be proactive (writing to people to get permissions), they can write great rationales (clearly explaining why an image should be used) and they can program bots (a very useful skill).

  1. Some of the editors who work on enforcing our non-free content policy are routinely harassed and verbally abused. Taking some time to watch the talk pages of those who conduct this work, and to deal with the routine insults, would go a long way to providing support for this work.

Resources[edit]

Documentation[edit]

List policies and guidelines.

Policies[edit]

Guidelines[edit]

Tools[edit]

List some common tools used to deal with non-free images.

Rationale templates[edit]

List some commonly accepted rationales that can be written with the help of templates, but warn that this is strictly limited and any other types of images should use the generic rationale template that needs to be filled in with specific details.

Generic style[edit]

Older style[edit]

Newer style[edit]

Noticeboards[edit]

List some noticeboards specifically for these issues.

See also[edit]

Please incorporate the following links into the page above, as and when needed.

Leave a Reply