THC Science

I've written a Python program to find Wikipedia articles that need photos near a particular location.

It's far from perfect—it gets false negatives and false positives for various reasons. If you notice a bug that you think I would be able to correct, let me know at User talk:Mx. Granger.

If you have any questions, please also let me know.

You may also find Special:Nearby, WikiShootMe, Wikidata locations, or the Commons mobile app useful. You can find older versions of the program in this page's history.

I release all of the text on this page, including the program, under CC-0.

How to use[edit]

Make sure that Python 3 is installed on your computer.
Make sure you have requests installed (run pip install requests in the terminal, or possibly pip3 install requests).
Save the program with a name such as findNeededPhotos.py. Make sure to change the values that are labeled "REQUIRED" near the top of the file.
Run the program in the terminal with python findNeededPhotos.py. After a little while (usually takes less than a minute, but depends how many Wikipedia articles there are in the area), it'll start displaying a list of articles needing pictures that are closest to the location you entered, sorted by distance.

For more information see mw:Extension:GeoData#list=geosearch and mw:API:Images.

The program[edit]

import requests
import time
import codecs

####################################################
######## Options to be changed by the user #########

# REQUIRED: Set this to your username or name and a way for the WMF to contact you
# This is needed because the program uses the API, see [[meta:User-Agent policy]] for details
headers = {'User-Agent': 'Insert username or name and contact info here'}

# REQUIRED: Change these numbers to the latitude and longitude you're interested in. This can be your current location, somewhere you're planning to travel to, or any place where you want to take pictures for Wikipedia.
mylat = 27.175
mylong = 78.041944

# Optional: If you want to search a different (non-English) Wikipedia, you can change the language code.
lang = "en"

# Optional: You can change this number to increase or decrease the maximum number of articles the program will output. (It may still output less than the maximum.)
maxplaces = 40

# The maximum distance (in meters) between the location inputted above and the articles that will be returned. Unfortunately the API doesn't support distances greater than 10000 meters.
maxdistance = 10000

####################################################
####################################################


def stripcomments(text):
  text = text + "  " #This line is to deal with the rare case where an article ends with "<" or "<-"
  newtext = ""
  comment = False
  for i in range(0,len(text)):
    if(text[i]=='<' and text[i+1]=='!' and text[i+2]=='-'):
      comment=True
    elif(comment==False):
      newtext = newtext + text[i]
    elif(comment==True):
      if(text[i-2]=='-' and text[i-1]=='-' and text[i]=='>'):
        comment=False

  return newtext


mylat = str(mylat)
mylong = str(mylong)
maxdistance = str(maxdistance)
listOfNearbyArticles = []

url= "https://" + lang + ".wikipedia.org/w/api.php?format=json&action=query&list=geosearch&gscoord=" + mylat + "|" + mylong + "&gsradius=" + maxdistance + "&gslimit=500"
r=requests.get(url, headers=headers)

splitOutput = codecs.decode(r.text,'unicode-escape').split('"title":"')
for part in splitOutput[1:]: #using [1:] to get rid of the first part, which doesn't have a page title.
  newTitle = part.split('"', 1)[0]
  listOfNearbyArticles.append(newTitle)

#This is the list of file extensions which I consider to indicate a plausibly good image (an image that probably disqualifies the article in question from being outputted by my program)
listOfFileExtensions = [".jpg", ".JPG", ".jpeg", ".JPEG"]

listOfArticlesToOutput = []

for title in listOfNearbyArticles:
  urlTitle = title.replace("&","%26")
  url= "https://" + lang + ".wikipedia.org/w/api.php?format=json&action=query&prop=images&titles=" + urlTitle + "&imlimit=40"
  r=requests.get(url, headers=headers)
  textFromAPI = r.text

  if any(x in textFromAPI for x in listOfFileExtensions): #The page probably has at least one good image
    #Second check
    url= "https://" + lang + ".wikipedia.org/w/api.php?format=json&action=parse&page=" + urlTitle + "&prop=wikitext"
    r=requests.get(url, headers=headers)
    article = stripcomments(r.text)

    if any(x in article for x in listOfFileExtensions):
      pass #both the list of images and the wikitext contain one of our file extensions, so it seems the article has a good image
    else: #The page has a good image, but it's not in the wikitext, so probably in a navigation template or similar
      listOfArticlesToOutput.append(title)
  else: #The page has no good images
    listOfArticlesToOutput.append(title)
    print(title)
  
  if(len(listOfArticlesToOutput)==maxplaces):
    break

THC Science

Bringing Science to the Cannabis Conversation!

Cannabis Ruderalis

How to use[edit]

The program[edit]

Leave a Reply