Kategorier
blandet

Prisovervågning på Skyr med Python og Django

Skyr er dyrt, men altid på tilbud. På https://etilbudsavis.dk/search/skyr kan man finde de aktuelle tilbud fra tilbudsaviserne.

Jeg har udviklet en lille hjemmeside, der monitorerer priserne på skyr i Storkøbenhavn. Du kan finde den på https://wallnot.dk/skyr.

Sådan ser hjemmesiden med historiske og (et lille bitte stykke) fremtidige priser på skyr ud.

Sådan virker det…

Skyrpriser består af:

  • En database med en model defineret i Django’s models.py
  • Et python-script, cron_skyrpriser.py, der køres som job en gang i døgnet og skraber tilbud på skyr fra https://etilbudsavis.dk og gemmer dem i databasen
  • Et view i Django’s views.py, der gør data fra databasen klar i en struktur, der er brugbar i sidens skabelon
  • En skabelon (index.html), som indeholder sidens HTML-kode, stylesheet og det javascript der, ved hjælp af biblioteket Chart.js, genererer grafen over Skyrpriser

Jeg starter med datamodellen i models.py. Hovedtabellen hedder “Offer” og gemmer typen af skyr, hvilken butik, der er tale om, hvilken dato tilbuddet gælder og kiloprisen for tilbuddet:

from django.db import models
from django.utils import timezone
from django.contrib import admin

class Offer(models.Model):
	skyr_type = models.CharField('Skyrtype', max_length=100)
	store = models.CharField('Butik', max_length=100)
	date = models.DateField('Dato')
	price_per_kilo = models.FloatField('Kilopris')
	added_at = models.DateTimeField('Tilføjelsesdato', default=timezone.now, editable=False)
	
class OfferAdmin(admin.ModelAdmin):
	list_display = ('store','skyr_type','date')
	list_filter = ('store', 'skyr_type')
	search_fields = ['store', 'skyr_type']

Så kommer jeg til cron_skyrpriser.py. Jeg har brugt min browsers udviklerværktøjer til at finde ud af, hvordan jeg taler med API’et for etilbudsavis.dk og får data tilbage i JSON-format. Jeg henter de felter, jeg har brug for og gemmer dem i databasen, hvis de ikke allerede findes i databasen:

import requests
from datetime import datetime, date, timedelta
from bs4 import BeautifulSoup
import psycopg2
from psycopg2 import Error
import pytz

now = datetime.now()
cph = pytz.timezone('Europe/Copenhagen')

# Connect to database
try:
	connection = psycopg2.connect(user = "[slettet]",
									password = "[slettet]",
									host = "[slettet]",
									port = "",
									database = "[slettet]")
	cursor = connection.cursor()
except (Exception, psycopg2.Error) as error:
	print ("Error while connecting to PostgreSQL", error)

### INSERT SKYR IN DATABASE FUNCTION ###

def insert_in_database(connection, offer):
	with connection:
		with connection.cursor() as cur:
			try:
				sql = ''' SELECT * from skyrpriser_offer WHERE skyr_type = %s AND store = %s AND date = %s'''
				cur.execute(sql, (offer[0], offer[1], offer[2]))
				results = cur.fetchall()
				if not results:
					sql = ''' INSERT INTO skyrpriser_offer(skyr_type,store,date,price_per_kilo,added_at)
					VALUES(%s,%s,%s,%s,%s)'''
					cur.execute(sql, offer)	
			except Error as e:
				print(e, offer)

# Scrape prices of skyr and save to database
def main():
	url = "https://etilbudsavis.dk/api/squid/v2/sessions"
	session = requests.Session()
	headers = {
		'authority': 'etilbudsavis.dk',
		'accept': 'application/json',
		'dnt': '1',
		'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36',
		'x-api-key': '[slettet]',
		'sec-fetch-site': 'same-origin',
		'sec-fetch-mode': 'cors',
		'sec-fetch-dest': 'empty',
		'referer': 'https://etilbudsavis.dk/search/skyr',
		'accept-language': 'en-US,en;q=0.9',
		'cookie': 'sgn-flags=^{^%^22flags^%^22:^{^}^}; sgn-consents=^[^]',
	}
	session.headers.update(headers)
	response = session.get(url)
	url = "https://etilbudsavis.dk/api/squid/v2/offers/search?query=skyr&r_lat=55.695497&r_lng=12.550145&r_radius=20000&r_locale=da_DK&limit=24&offset=0"
	response = session.get(url)
	response_json = response.json()

	for item in response_json:
		skyr_type = item['heading']
		store = item['branding']['name']
		valid_from = item['run_from']
		valid_from = datetime.strptime(valid_from, '%Y-%m-%dT%H:%M:%S%z').astimezone(cph).date()
		valid_to = item['run_till']
		valid_to = datetime.strptime(valid_to, '%Y-%m-%dT%H:%M:%S%z').astimezone(cph).date()
		price = item['pricing']['price']
		amount = item['quantity']['size']['from']
		measure = item['quantity']['unit']['symbol']
		if measure == "g":
			price_per_kilo = price/amount*1000
		elif measure == "kg":
			price_per_kilo = price/amount
		number_of_days = int((valid_to - valid_from).days)
		for day in range(number_of_days+1):
			date = valid_from + timedelta(day)
			offer = (skyr_type, store, date, price_per_kilo, now)
			insert_in_database(connection, offer)

main()
print("Opdaterede skyrpriser")

I Django’s views.py henter jeg databasetabellens indhold og gør dem klar vha. nogle løkker, som formentlig er ret ineffektive, men virker OK:

from django.shortcuts import render
from .models import Offer
from django.db.models import Max, Min
from datetime import timedelta

# Main page
def skyrindex(request):
	offers = Offer.objects.all().order_by('date')
	context = {}
	if offers:
		date_min = Offer.objects.aggregate(Min('date'))['date__min']
		date_max = Offer.objects.aggregate(Max('date'))['date__max']
		number_of_days = (date_max - date_min).days
		dates = []
		for i in range(number_of_days + 1):
			dates.append(date_min + timedelta(i))
		
		structure = {}
		for offer in offers:
			if not offer.store in structure:
				structure[offer.store] = {}
			if not offer.skyr_type in structure[offer.store]:
				structure[offer.store][offer.skyr_type] = []
			structure[offer.store][offer.skyr_type].append({offer.date: round(offer.price_per_kilo, 1)})

		new_structure = {}
		for store, offer in structure.items():
			new_structure[store] = {}
			for skyr_type, prices in offer.items():	
				new_structure[store][skyr_type] = []
				for date in dates:
					have_price = False
					for price in prices:
						if date in price:
							new_structure[store][skyr_type].append({date: price[date]})
							have_price = True
					if not have_price:
						new_structure[store][skyr_type].append({date: ","})

		context = {'dates': dates, 'structure': structure, 'new_structure': new_structure}
	return render(request, 'skyrpriser/index.html', context)

Og til sidst har jeg så index.html, som også har nogle (for mig) ret komplicerede løkker for at strukturere data i et format som Javascript-bibliteket Chart.js kan forstå.

Jeg benytter mig af nogle, synes jeg, smarte features i Django’s løkke-funktioner:

  • cycle: Gør det muligt at løbe igennem en prædefineret række værdier hver gang løkken køres, her brugt til at få en ny farve per linje i diagrammet.
  • forloop.last: Den sidste gang en løkke kører, sættes variablen forloop.last. Det gør at jeg fx nemt kan sætte komma efter hver dato i diagrammets x-akse, undtagen efter den sidste dato på listen.

Her er index.html:

<h1>Skyrpriser</h1>
		
<canvas id="myChart"></canvas>
<script>
var ctx = document.getElementById('myChart');
var myChart = new Chart(ctx, {
    type: 'line',
    data: {
        labels: [{% for date in dates %}'{{ date|date:"d. b" }}'{% if not forloop.last %}, {% endif %}{% endfor %}],
        datasets: [{% for store, offer in new_structure.items %}
						{% for skyr_type, prices in offer.items %}
							{
							fill: false,
							backgroundColor: {% cycle "'#e9ecef'," "'#ffc9c9'," "'#fcc2d7'," "'#eebefa'," "'#d0bfff'," "'#bac8ff'," "'#a5d8ff'," "'#99e9f2'," "'#96f2d7'," "'#b2f2bb'," "'#d8f5a2'," "'#ffec99'," "'#ffd8a8'," %}
							borderColor: {% cycle "'#e9ecef'," "'#ffc9c9'," "'#fcc2d7'," "'#eebefa'," "'#d0bfff'," "'#bac8ff'," "'#a5d8ff'," "'#99e9f2'," "'#96f2d7'," "'#b2f2bb'," "'#d8f5a2'," "'#ffec99'," "'#ffd8a8'," %}
							label: '{{ store|safe }}, {{ skyr_type|safe }}',
							data: 	[
									{% for price in prices %}
										{% for date, cost in price.items %}
											{% if not cost == "," %}
											{{ cost|unlocalize }}
											{% endif %}
										{% endfor %}
										{% if not forloop.last %}
										,
										{% endif %}
									{% endfor %}
									]
							}{% if not forloop.last %},{% endif %}
						{% endfor %}
						{% if not forloop.last %},{% endif %}
					{% endfor %}]
    },
    options: {
		responsive: true,
		spanGaps: false,
		title: {
			display: true,
			text: 'Tilbud på Skyr over tid'
		},
		tooltips: {
			mode: 'index',
			intersect: false,
		},
		hover: {
			mode: 'nearest',
			intersect: true
		},		
		scales: {
			xAxes: [{
				display: true,
				scaleLabel: {
					display: true,
					labelString: 'Dato'
				}
			}],
			yAxes: [{
				display: true,
				scaleLabel: {
					display: true,
					labelString: 'Pris pr. kilo Skyr i kroner'
				}
			}]
		}


    }
});
</script>
Kategorier
blandet

Kalender med ugenumre og helligdage

Jeg har lavet endnu et Django-eksperiment med https://ugenr.dk som det store forbillede.

En digital kalender

https://wallnot.dk/kalender kan du altid finde en Mayland-style-kalender med helligdage og ugenumre. Kalenderen understøtter år 1 til 9999.

Den er lavet med tre side-funktioner: En der altid viser nuværende halvår (kalindex), en der viser et hvilket som helst halvår (kalperiod) og en der kan vise et hvilket som helst år (kalyear). Alle tre funktioner kalder en fjerde funktion get-dates der returnerer datoer i kalenderen for det år og/eller halvår, der efterspørges.

Her er views.py:

from django.shortcuts import render
import datetime
from workalendar.europe import Denmark	# Module containing most Danish holidays
from django.http import Http404

# Function to return all calendar dates and other context data
def get_dates(year, period, now):
	now_isocalendar = now.isocalendar()
	
	### HOLIDAY LIST FOR YEAR IS GENERATED ###
		
	# Create dictionary with all holidays of the year
	holidays = Denmark().holidays(year)
	
	all_holidays = {}
	all_holidays[datetime.date(year,5,1)] = ["Første maj", "Særlig dag"]
	all_holidays[datetime.date(year,6,5)] = ["Grundlovsdag", "Særlig dag"]
	all_holidays[datetime.date(year,12,31)] = ["Nytårsaften", "Særlig dag"]

	holiday_lookup = {
						"New year": ["Nytårsdag", "Helligdag"],
						"Holy Thursday": ["Skærtorsdag", "Helligdag"],
						"Good Friday": ["Langfredag", "Helligdag"],
						"Easter Sunday": ["Påskedag", "Helligdag"],
						"Easter Monday": ["2. påskedag", "Helligdag"],
						"Store Bededag": ["Store bededag", "Helligdag"],
						"Ascension Thursday": ["Kr. himmelfart", "Helligdag"],
						"Pentecost Sunday": ["Pinsedag", "Helligdag"],
						"Pentecost Monday": ["2. pinsedag", "Helligdag"],
						"Christmas Eve": ["Juleaften", "Særlig dag"],
						"Christmas Day": ["1. juledag", "Helligdag"],
						"Second Day of Christmas": ["2. juledag", "Helligdag"],
					}
	
	for holiday in holidays:
		# Check for two holidays on same day
		if holiday[0] not in all_holidays:
			all_holidays[holiday[0]] = (holiday_lookup[holiday[1]][0], holiday_lookup[holiday[1]][1])
		# If two on the same day, names are concenated
		else:
			all_holidays[holiday[0]] = (holiday_lookup[holiday[1]][0] + "/" + all_holidays[holiday[0]][0] , holiday_lookup[holiday[1]][1])
	
	### DATES FOR YEAR ARE GENERATED IN A DAY AND MONTH DIMENSION ###
	
	# First dimension is maximum number of days in a month
	dates_in_year = {}
	for day in range(1,32):
		dates_in_year[day] = []
	
	# Second dimension is that date for each month
	for day in range(1,32):
		for month in period:
			# If the generated day actually is a valid date, day is added to dates_in_year dictionary
			try:
				date_to_add = datetime.date(year,month,day)
				date_isocalendar = date_to_add.isocalendar()
								
				# HOLIDAY LOGIC #
				# If day is special, get type of day and name of day
				if date_to_add in all_holidays:
					type_of_day = all_holidays[date_to_add][1]
					name_of_day = all_holidays[date_to_add][0]
				# If not, type of day is normal and no name
				else:
					type_of_day = "Normal dag"
					name_of_day = "Intet navn"
				
				# HTML BORDER CLASS LOGIC #
				html_class = ""
				
				# Year of date must be the same as year of current date
				if date_isocalendar[0] == now_isocalendar[0]:
					# Week number is the same as current week number
					if date_isocalendar[1] == now_isocalendar[1]:
						# All days get a red right and red left class
						html_class = "redleft redright"
						# Sunday also gets a red bottom class
						if date_isocalendar[2] == 7:
							html_class += " redbottom"
					# Date is Sunday in the week before current
					elif date_isocalendar[1] == now_isocalendar[1] - 1 and date_isocalendar[2] == 7:
						html_class += " redbottom"
					# Same date next month is in current week
					try:
						date_next_month = datetime.date(year,month + 1,day)
						date_next_month_isocalendar = date_next_month.isocalendar()
						# Week number is the same as current week number
						if date_next_month_isocalendar[1] == now_isocalendar[1]:
							html_class = "redright"
					except ValueError:
						pass
				date_data = (date_to_add, type_of_day, name_of_day, html_class)
				dates_in_year[day].append(date_data)
			# Except when that dates does not exist, e.g. february 30
			except ValueError:
				dates_in_year[day].append("NON-EXISTING DATE")
	
	context = {'year': str(year), 'next': year+1, 'previous': year-1, 'dates_in_year': dates_in_year, 'period': period, 'now': now}
	return context
	
# Main page
def kalindex(request):
	now = datetime.datetime.now()
	year = now.year
	month = now.month
	if month < 7:
		period = range(1,7)
	else:
		period = range(7,13)
	# Run function to get calendar dates
	context = get_dates(year, period, now)
	return render(request, 'kalender/index.html', context)

# Earlier or future year page
def kalyear(request, year):
	# If year is not an integer, a 404 error is thrown
	try:
		year = int(year)
	except ValueError:
		raise Http404
	# If year is between 1 and 10000, a calendar is rendered
	if year > 0 and year < 10000:
		now = datetime.datetime.now()
		period = range(1,13)
		# Run function to get calendar dates
		context = get_dates(year, period, now)
		return render(request, 'kalender/index.html', context)
	# If not, a 404 error is thrown
	else:
		raise Http404
	
# Earlier or future year page
def kalperiod(request, year, period):
	# If year is not an integer, a 404 error is thrown
	try:
		year = int(year)
	except ValueError:
		raise Http404
	# If year is between 1 and 10000, a calendar is rendered
	if year > 0 and year < 10000 and (period == "1" or period == "2"):
		if period == "1":
			period = range(1,7)
		elif period == "2":
			period = range(7,13)
		now = datetime.datetime.now()
		# Run function to get calendar dates
		context = get_dates(year, period, now)
		return render(request, 'kalender/index.html', context)
	# If not, a 404 error is thrown
	else:
		raise Http404	

Sidens skabelon index.html ser en lille smule rodet ud (af hensyn til at minimere sidens størrelse). Skabelonen genererer en tabel ved at gennemgå alle dagene i kalenderen og tilføje særlige layout-regler for lørdage, søndage, helligdage, den nuværende uge, dag og ugenummer osv.

Her er den del af den, der benytter sig af Djangos skabelon-funktioner. (Resten af koden kan du finde ved at bruge “view source” på https://wallnot.dk/kalender):

{% if period|length == 12 %}
{% if not year == "1" %}<a href="{% url 'kal_year' previous %}" title="Se kalender for året før">« forrige</a>{% endif %}<h1> Kalender for år {{ year }} </h1>{% if not year == "9999" %}<a href="{% url 'kal_year' next %}" title="Se kalender for året efter">næste »</a>{% endif %} <a class="calendartype" href="{% if now|date:"n" == "7" or now|date:"n" == "8" or now|date:"n" == "9" or now|date:"n" == "9" or now|date:"n" == "10" or now|date:"n" == "11" or now|date:"n" == "12" and now|date:"Y" == year %}{% url 'kal_period' year 2 %}{% else %}{% url 'kal_period' year 1 %}{% endif %}" title="Gå til halvårskalender">Til halvårskalender</a>
{% elif period|last == 6 %}
{% if not year == "1" %}<a href="{% url 'kal_period' previous 2 %}" title="Se kalender for halvåret før">« forrige </a>{% endif %}<h1> Kalender for år {{ year }}, første halvår </h1><a href="{% url 'kal_period' year 2 %}" title="Se kalender for halvåret efter">næste »</a> <a class="calendartype" href="{% url 'kal_year' year %}" title="Gå til helårskalender">Til helårskalender</a>
{% else %}
<a href="{% url 'kal_period' year 1 %}" title="Se kalender for halvåret før">« forrige</a><h1> Kalender for år {{ year }}, andet halvår </h1>{% if not year == "9999" %}<a href="{% url 'kal_period' next 1 %}" title="Se kalender for halvåret efter">næste »</a>{% endif %} <a class="calendartype" href="{% url 'kal_year' year %}" title="Gå til helårskalender">Til helårskalender</a>
{% endif %}

<p>I dag er det {{ now|date:"l" }} den {{ now|date }} i uge {{ now|date:"W" }}</p>

<table>
	<thead>
		<tr>
		{% if period|length == 12 %}
			<th>Januar</th>
			<th>Februar</th>
			<th>Marts</th>
			<th>April</th>
			<th>Maj</th>
			<th>Juni</th>
			<th>Juli</th>
			<th>August</th>
			<th>September</th>
			<th>Oktober</th>
			<th>November</th>
			<th>December</th>
		{% elif period|last == 6 %}	
			<th>Januar</th>
			<th>Februar</th>
			<th>Marts</th>
			<th>April</th>
			<th>Maj</th>
			<th>Juni</th>
		{% else %}
			<th>Juli</th>
			<th>August</th>
			<th>September</th>
			<th>Oktober</th>
			<th>November</th>
			<th>December</th>		
		{% endif %}	
		</tr>
	</thead>
	<tbody>
	{% for month, monthdays in dates_in_year.items %}
		<tr>
		{% for day in monthdays %}
			<td{% if day.1 == "Helligdag" or day.0|date:"w" == "0" %} class="holy{% if day.3 %} {{ day.3 }}{% endif %}"{% elif day == "NON-EXISTING DATE" %} class="noborder"{% elif year == now|date:"Y" and day.3 %} class="{{ day.3 }}"{% endif %}>
			
				<div title="{{ day.0|date:"l"|capfirst }}" class="weekday{% if day.0|date:"w" == "6" %} saturday{% endif %}{% if day.0|date == now|date %} red{% endif %}">{{ day.0|date:"D"|slice:":1"|upper }}</div>
	
				<div class="datenum{% if day.0|date:"w" == "6" %} saturday{% endif %}{% if day.0|date == now|date %} red{% endif %}">{{ day.0|date:"j" }}</div>

				{% if day.0|date:"w" == "1" %}<div title="Uge {{ day.0|date:"W" }}" class="weeknum{% if day.0|date:"Y W" == now|date:"Y W" %} red{% endif %}">{{ day.0|date:"W" }}</div>{% endif %}

				{% if day.1 == "Helligdag" or day.1 == "Særlig dag" %}<div title="{{ day.2 }}" class="named{% if "/" in day.2 and period|length == 12 %} named-small{% endif %}{% if day.0|date == now|date %} red{% endif %}">{{ day.2 }}</div>{% endif %}
				
			</td>
		{% endfor %}	
	</tr>
	{% endfor %}	
	</tbody>
</table>

Kategorier
blandet

En enkel besøgstæller

https://wallnot.dk/count har jeg oprettet en besøgstæller.

Den tæller besøg på siden, når:

  • Den nyeste besøgende ikke er den samme som den sidste besøgende

Datamodellen i models.py definerer en tæller, ip-adressen på sidste besøgende og tidspunkt for sidste opdatering af tælleren:

from django.db import models
from django.utils import timezone

class Counter(models.Model):
    count = models.PositiveIntegerField('Besøgende nummer')
    last_ip = models.GenericIPAddressField('Sidste besøgendes IP-adresse')
    date = models.DateTimeField(default=timezone.now, editable=False)

I views.py definerer jeg logikken bag, hvornår der skal opdateres. Jeg bruger et modul til Django for at finde brugerens IP:

from django.shortcuts import render
from .models import Counter
from ipware import get_client_ip

def countindex(request):
	# Get current count
	try:
		counter = Counter.objects.get(pk=1)
	# If a count does not exist (first visit to site), one is created
	except:
		firstcount = Counter(count=0, last_ip='0.0.0.0'
		)
		firstcount.save()
		counter = Counter.objects.get(pk=1)

	# Get user IP
	client_ip, is_routable = get_client_ip(request)

	# If user IP exists, check whether user is identical to last user
	# (If no user IP, nothing happens)
	if client_ip is not None:
		# Check whether user is identical to last user
		try:
			Counter.objects.get(last_ip=client_ip)
		# If not, one is added to visitor count and IP is saved
		except:
			counter.count += 1
			counter.last_ip = client_ip
			counter.save()
	context = {'ip': client_ip, 'counter': counter}
	return render(request, 'vcounter/index.html', context)

Endelig har jeg min skabelon index.html som viser brugeren hvad nummer besøgende, hun er, og hendes IP-adresse:

<h1>Du er besøgende nummer<br>
<strong>{{ counter.count }}</strong></h1>
(Dit ip-nummer er: {{ ip }})

Voila!

Kategorier
blandet

Wallnot i version 2.0

En af Wallnots få (men trofaste) brugere, bad om arkiv- og søgefunktionalitet på Wallnot.

Det krævede en større omlægning af Wallnot fra:

  • En side, der viser links til et øjebliksbillede af gratisartikler fra forsiden af danske netaviser.

Til:

  • En side der løbende arkiverer links til gratisartikler fra danske netaviser

Det kræver:

  • En bagvedliggende database
  • Løbende vedligeholdelse så links, der ændrer status fra gratis- til betalingsartikler, fjernes fra siden

Den nye Wallnot har:

  • Søgefunktion på artikeloverskrifter
  • Arkiv, der hele tiden bliver større
  • Zetland- og delte Politiken-artikler fra de sidste par år. Zetlandarkivet er nærmest komplet.
  • En robot, der løbende tjekker links fra de sidste par dage for ændret betalingsmursstatus
  • Mulighed for at filtrere Ritzau-telegrammer og dubletartikler fra
  • Bevaret hurtig- og enkeltheden fra version 1.

Arkitekturen bag Wallnot version 2

Version 2 af Wallnot er udviklet i Django, mens robotterne der indsamler og vedligeholder links er skrevet i Python.

Selve omlægningen til Django er faktisk enkel.

I models.py beskrives datamodellen, altså felterne i den bagvedliggende database:

from django.db import models
from django.utils import timezone
from django.contrib import admin

# Create your models here.
class Article(models.Model):
	title = models.CharField('Overskrift', max_length=500)
	unique_id = models.CharField('Avisens artikel-id', max_length=20, unique=True, null=True, blank=True)
	date = models.DateTimeField('Publiceringstidspunkt')
	MEDIUM_CHOICES = (
		('politiken', 'Politiken'),
		('berlingske', 'Berlingske'),
		('jyllandsposten', 'Jyllandsposten'),
		('information', 'Information'),
		('kristeligtdagblad', 'Kristeligt Dagblad'),
		('weekendavisen', 'Weekendavisen'),
		('zetland', 'Zetland'),
		('finansdk', 'Finans.dk'),
		('borsen', 'Børsen'),
		('arbejderen', 'Arbejderen'),
	)
	medium = models.CharField('Medie', max_length=30, choices=MEDIUM_CHOICES)
	url = models.URLField('Adresse', max_length=400, unique=True)
	ritzau = models.BooleanField('Ritzautelegram', default=False, null=True, blank=True)
	excerpt = models.CharField('Første sætning', max_length=1000, null=True, blank=True)
	duplicate = models.BooleanField('Dublet', default=False, null=True, blank=True)
	user_reports_paywall = models.BooleanField('Brugerrapporteret paywall', default=False, null=True)
	created_at = models.DateTimeField('Tilføjet den', default=timezone.now, editable=False)

class ArticleAdmin(admin.ModelAdmin):
	list_display = ('title','unique_id','ritzau','duplicate','excerpt','date')
	list_filter = ('medium', 'user_reports_paywall', 'ritzau','duplicate')
	search_fields = ['title', 'unique_id', 'excerpt']

Derudover skal der bygges et view, der beskriver forespørgslen til databasen. Her i en forkortet udgave uden logikken bag brugerrapportering af links bag paywall:

from django.shortcuts import render
from django.core.paginator import Paginator
import requests
import json
from .models import Article

def index(request):
	articles = Article.objects.order_by('-date')
	searchterm = request.GET.get('q')
	medium = request.GET.get('m')
	ritzau = request.GET.get('r')
	duplicates = request.GET.get('d')
	newwindow = request.GET.get('w')
	if searchterm:
		firstsearchcharacter = searchterm[:1]
		# Exclude queries by adding ! to searchterm
		if firstsearchcharacter == "!":
			searchterm = searchterm[1:]
			articles = articles.exclude(title__iregex=searchterm)
			searchterm = "!" + searchterm
		# Perform normal regex-enabled search
		else:
			articles = articles.filter(title__iregex=searchterm)
	if medium:
		articles = articles.filter(medium=medium)
	if ritzau:
		articles = articles.exclude(ritzau=True)
	if not duplicates and not medium:
		articles = articles.exclude(duplicate=True)
	paginator = Paginator(articles, 80)
	page_number = request.GET.get('page')
	page_obj = paginator.get_page(page_number)
	context = {'request': request, 'page_obj': page_obj, 'medium': medium, 'searchterm': searchterm, 'ritzau': ritzau, 'newwindow': newwindow, 'duplicates': duplicates}
	return render(request, 'wall/index.html', context)

Til sidst skrives en skabelon (template) der omsætter data til HTML. Her er fx den ganske korte bid kode, der spytter artikellinks ud på siden:

{% for article in page_obj %}
	{% ifchanged article.date|date %}<h3>{{ article.date|date }}</h3>{% endifchanged %}
	<p>{{ article.date|date:"H:i" }}: <a href="{{ article.url }}"{% if newwindow %} target="_blank"{% endif %}>{{ article.title }}</a> {% if article.ritzau %}<small><sup> ritzau </sup></small> {% endif %}{% if article.duplicate and not medium %}<small><sup> dublet </sup></small> {% endif %}<img title="Giv besked hvis artiklen er bag en paywall" id="{{ article.id }}" class="myBtnt" src="{% static "wall/alert.svg" %}"/></p>
{% endfor %}

God fornøjelse med den nye Wallnot!

Kategorier
blandet

En lille Google-crawler

Til Wallnot ville jeg gerne have fat i samtlige Zetland-historier, som Google har indekseret.

Til det formål skrev jeg et lille program, der gennemgår Googles søgeresultater. Programmet holder en lille pause mellem hver side med søgeresultater, der hentes. Det skyldes at Google åbenbart ikke selv er vild med robotter, paradoksalt nok.

import requests
from bs4 import BeautifulSoup
import time
import random

linkcollection = []
def google_results(url):
	try:
		result = requests.get(url)
		soup = BeautifulSoup(result.text, "lxml")
		links = soup.find_all('a')

		for link in links:
			if "zetland.dk/historie/" in link['href']:
				full_link = link['href']
				url = full_link[full_link.find("q=")+2:full_link.find("&")]
				linkcollection.append(link['href'])
				print(link['href'])
		next_page = soup.find('a', attrs={'aria-label': 'Næste side'})
		time_to_sleep = random.randrange(3,7)
		print("Sleeping " + str(time_to_sleep) + " seconds")
		time.sleep(time_to_sleep)
		google_results('https://www.google.com'+next_page['href'])
	except TypeError:
		print("No more results it seems")

url = 'https://www.google.com/search?q=site:zetland.dk/historie'
google_results(url)

with open("./googlelist.txt", "wt", encoding="utf8") as fout:
	fout.write(str(linkcollection))
Kategorier
blandet

Mit eventyr med Facebook

Jeg syntes, det var ubehageligt at kunne se, hvordan Facebook lod forskellige virksomheder matche deres oplysninger om mig (e-mails, telefonnummer, den slags) med min Facebook-konto.

Det drejede sig både om store, grimme virksomheder:

Og små, søde, rettighedsorienterede NGO’er:

Min idé til en løsning var:

  • At slette mit telefonnummer fra min profil på Facebook
  • At oprette en e-mail særligt til min Facebook-profil (lad os sige: facebookholderojemed@helmstedt.dk)

…Og så håbede jeg ellers, at “matchene” ville ophøre.

Men: Det gjorde de ikke.

Hvordan kan det være?

Facebook beholder alle mailadresser, du nogensinde har haft tilknyttet din konto – også de, du selv sletter! Og de lader annoncører matche deres oplysninger om DIG med de oplysninger, du troede, du havde slettet.

(Du kan prøve at se selv, ved at bruge Facebook’s funktion til at downloade en kopi af dine data.)

Så begyndte mit rigtige eventyr:

Mit forsøg på at få slettet gamle mailadresser hos Facebook

Efter at have svaret på et hav af spørgsmål, fik jeg lov at udfylde en kontaktformular:

Select the product that you need help with : Facebook
What can we help you with? : I want to manage my data
Select one of the following options : I have a different objection to the use of my data
Full name : Morten Helmstedt
Please provide your best contact email address : facebookholderojemed@helmstedt.dk
Where do you live? : Denmark

What data processing activity or activities are you objecting to? : I am objecting to Facebook matching my personal information with information uploaded by advertisers and do not consent to Facebook allowing advertisers to do this.
Even though I have changed my e-mail address to an address only used for Facebook communication and have deleted my phone number and additional e-mail addresses from Facebook, Facebook still matches my information with lists from advertisers. See https://helmstedt.dk/Clipboard01.png for a screenshot.
I would like to be informed in what way advertisers are able to match their lists with my Facebook profile, when no information from my Facebook profile could be in possession by these advertisers after I changed my Facebook profile e-mail and phone number.

If this matching is done by Facebook keeping deleted information from my profile, I do not consent to Facebook keeping this information and I request that this information be deleted, as Facebook no longer has any valid grounds or consent for keeping this information.

Please explain how this processing impacts you. : I have a right to control my personal data according to the EU GDPR regulations and have not consented to Facebook matching my personal data with personal data from advertisers.

By submitting this notice, you represent that all of the information you’ve provided is true and accurate. : I agree

Det var i første omgang ikke til så meget hjælp. Facebook bekræftede blot, deres praksisser:

Hi,

Thanks for contacting us.

To build a product that connects people across continents and cultures, we need to make sure that everyone can afford it. Advertising lets us keep Facebook free. You can’t opt out of ads altogether because ads are what keep Facebook free, but you do have different options to control how your personal data can and can’t be used to show you ads. They’re all found in ad preferences:
https://www.facebook.com/ads/preferences/?ref=CR

Please note that we do not tell advertisers who you are or sell your information to anyone.

There are a few ways that advertisers can reach you with ads on Facebook:

Information from your use of Facebook

When you use Facebook, you can choose to share things about yourself, such as your age, gender, home town or friends. You can also click or like posts, Pages or articles. We use this information to understand what you might be interested in and hopefully show you ads that are relevant. If a bike shop comes to Facebook wanting to reach female cyclists in Liverpool, we can show their ad to women in Liverpool who liked a Page about bikes. But here’s what’s key: these businesses don’t know who you are. We provide advertisers with reports about the kinds of people seeing their ads and how their ads are performing, but we don’t share information that personally identifies you. You can always see the “interests” assigned to you in your ad preferences, and if you want, remove them.

Information that an advertiser shares with us

In this case, advertisers bring us their customer information so they can reach the same people on Facebook. These advertisers might have your email address from a purchase you made, or from some other data source. We find Facebook accounts that match that data, but we don’t tell the advertiser which accounts were matched. In ad preferences (https://www.facebook.com/ads/preferences/) you can see which advertisers with your contact information are currently running campaigns – and you can click the top right-hand corner of any ad to hide all ads from that business.

Information that websites and apps send to Facebook

Some of the websites and apps you visit may use Facebook tools to make their content and ads more relevant and better understand the results of their ad campaigns. For example, if an online retailer is using Facebook pixel, they can ask Facebook to show ads to people who looked at a certain style of shoe or put a pair of shoes into their shopping basket. If you don’t want this data used to show you ads, you can turn it off in ad preferences.

You can learn more about Facebook pixel and how it works here:
https://www.facebook.com/business/learn/facebook-ads-pixel/?ref=CR

You can decide which parts of your profile you want to be used for ad targeting in the Information section under “About you”. You can remove yourself from interests under “Interests” and categories under “Your categories”. You can also turn off ads that use data from apps or websites that you visit in the Ads settings section under “Ads based on use of websites and apps”.

The “How is this information shared?” section of our Data Policy also discusses in more detail how ads work on Facebook: https://www.facebook.com/about/privacy/?ref=CR

The form that you submitted allows EU residents to report objections to certain types of processing of their personal data under the EU General Data Protection Regulation (GDPR). If you want to object to a specific type of data processing listed in our Data Policy, please visit the Help Centre to learn more about making an objection under GDPR and in what circumstances an objection may be successful:
https://www.facebook.com/help/2069235856423257/?ref=CR

Before you submit another objection, you may want to learn more about our legal bases for processing data, including the instances where the processing is necessary for our legitimate interests (or that of a third party) or for a task carried out in the public interest: https://www.facebook.com/about/privacy/legal_bases/?ref=CR

We hope this helps, but please let us know if you have any other questions.

Thanks,
Ryan,
Privacy Operations
Facebook

Jeg bad så specifikt Facebook om at slette mine gamle e-mailadresser, da jeg havde svært ved at se noget som helst gyldigt formål ved at opbevare disse data. Her er svaret, jeg fik:

Hi Morten,

Thanks for following up with us.

From your report it appears you would like to delete information from your Facebook account.

Deleting your information from Facebook
You can delete specific data points from your Facebook account via your Activity Log. You can access your Activity Log and to choose to delete certain data points by accessing your Settings > Your Facebook Information > Activity Log.
It appears from your report you are already aware of this, but please note you can choose to permanently delete your profile, photos, posts, videos, and everything else you’ve added at any time by deleting your Facebook account. You can follow the steps outlined in the following Help Center article to understand how to permanently delete your account:
https://www.facebook.com/help/224562897555674

Deleting previous email addresses
From your report it appears you would like us to delete the email addresses previously associated with your account.
As explained in our Data Policy (https://www.facebook.com/policy.php (https://www.facebook.com/about/privacy/update#legal-requests-prevent-harm)), we need to process certain information in order to detect and prevent spam and other bad experiences on Facebook, maintain the integrity of our Products, and promote safety and security on and off the Facebook Products.
In relation to your specific request, the information you have asked to be deleted is necessary for us to maintain the integrity of our Products, and promote safety and security on and off the Facebook Products.

[…]

Under Article 17 of the GDPR there are limited grounds on which the erasure of personal data can be obtained. We have reviewed your request in light of the information you have provided to us, and we have found that your request does not meet one of the grounds listed in Article17 of the GDPR. We are therefore unable to take further action on your request to have this data point deleted.

Please note that you have a right to contact the Irish Data Protection Commission, which is Facebook’s lead supervisory authority (please see www.dataprotection.ie (http://www.dataprotection.ie/))

You also have the right to contact your local data protection authority and to bring a claim before the courts.

We trust this addresses your query but please let us know if you have any further questions.

Thanks,
Elsa
Privacy Operations
Facebook

Så Facebook mener, at:

In relation to your specific request, the information you have asked to be deleted is necessary for us to maintain the integrity of our Products, and promote safety and security on and off the Facebook Products

Aha!

Jeg forsøgte mig med det (næst)sidste kort, jeg havde på hånden:

Dear Elsa
Under GDPR I have a right to rectify incorrect data. As my previous e-mail addresses are no longer my e-mail addresses, they should be deleted. You have no valid use for e-mail addresses that I no longer use. Please delete those e-mail addresses and confirm.
Best regards,
Morten

Den logik kunne Elsa fra Facebook ikke følge:

Hi Morten,

Thanks for following up with us and for your patience.

As explained in our Data Policy (https://www.facebook.com/policy.php), we need to process certain information in order to detect and prevent spam and other bad experiences on Facebook, maintain the integrity of our Products, and promote safety and security on and off the Facebook Products. Please note that all these measures are taken to make the platform safer for our users.

If your personal data is inaccurate, you have the right to have the data rectified by Facebook. In this case, there is no suggestion that we are processing inaccurate data, but rather retaining your previous email address for the reasons explained above.

We won’t be able to take any further action on this request, but please note that you have a right to contact the Irish Data Protection Commission, which is Facebook’s lead supervisory authority (please see www.dataprotection.ie (http://www.dataprotection.ie/))

Best regards,
Elsa
Privacy Operations
Facebook

For Facebook handler ukorrekte persondata ikke om data, der ikke længere/nogensinde beskriver personen korrekt. For alt data, der er tilgået Facebook, er per definition korrekt råmateriale for Facebooks overvågning.

Moralen?

Nu har jeg slettet min Facebookkonto. Jeg opfordrer dig til at gøre det samme.

Kategorier
blandet

Britta Nielsen-generator

Du ser et stort pengebeløb og tænker: Hvor mange gange Britta Nielsen svarer det egentlig til?

Nu kan du få svaret med regnemaskinen på https://wallnot.dk/britta

Kategorier
blandet

Sådan laver du en gratis Weekendavisen

Nu afslører jeg lige noget jeg opdagede, da jeg lavede https://wallnot.dk (som kun offentliggør gratisartikler): Weekendavisen er gratis!

Et lille udsnit af en betalingsartikel fra Weekendavisen.dk som flot struktureret JSON.

Selv om https://www.weekendavisen.dk/ ligner en typisk dansk netavis med gratis-artikler og paywall-artikler i én pærevælling, offentliggør Weekendavisen faktisk hele sit indhold. De ved det sikkert ikke selv – men udvikleren hos det smarte webbureau, der har udviklet deres side, ved det med sikkerhed.

Avisens oversigt over ugens avis – denne uge er det https://www.weekendavisen.dk/2019-51/oversigt – indeholder en fuldt offentlig JSON-streng med hele avisens indhold: fuld tekst, links til artikeloplæsninger, hele dynen.

Det er ret amatøragtigt.

Du ser det ikke i din browser når du besøger siden, men det er der.

Jeg har lavet et lille Python-script, der genererer din egen personlige Weekendavisen for den aktuelle uge i en fil, der hedder index.html. Det ser ikke særligt godt ud, der er kun de fulde tekster, ikke billeder og links til oplæsning – du kan selv arbejde videre med JSON-strengen, hvis du vil have det til at se flot ud.

Det kan være, jeg ødelægger det for mig selv, for hvis Weekendavisen retter fejlen, bliver jeg formentlig nødt til at omkode den del af wallnot.dk, der viser gratis Weekendavisen-artikler.

God fornøjelse med din gratis Weekendavisen.

# The Danish newspaper Weekendavisen.dk publishes all articles - even those supposedly behind a paywall - as json on their homepage.
# This small script creates an index.html file to read all articles from the current edition.

import requests
from bs4 import BeautifulSoup
import json

def weekendavisen():
	# Request front page
	data = requests.get("https://weekendavisen.dk")
	result = data.text

	# Soup site and create a list of links and their titles
	soup = BeautifulSoup(result, "html.parser")

	for a in soup.find_all('a'):
		if "/oversigt" in a['href']:
			overviewurl = a['href']

	edition = overviewurl[overviewurl.find(".dk/") + 4:overviewurl.find(".dk/") + 11]
	request = "https://weekendavisen.dk/" + edition + "/oversigt"

	# Request site and soup it
	data = requests.get(request)
	result = requests.utils.get_unicode_from_response(data) 
		
	soup = BeautifulSoup(result, "html.parser")
	content = soup.find('script', attrs={'class':'js-react-on-rails-component', 'data-component-name':"IndexPage"})
	jsonobject = content.string
		
	# Create json object
	jsondecode = json.loads(jsonobject)
	
	# Iterate through articles and articles to dictionary
	articlelist = []
	
	for section in jsondecode["sections"]:
		for item in section["items"]:
			summary = item["summary"]
			summary_output = '<b>' + summary[:summary.find(".") + 1] + '</b> ' + summary[summary.find(".") + 1:] + ''
			title = item["title"]
			title_output = '<h1><big>' + title + '</big></h1>'
			if item["type"] == "newsarticleplus":
				article = item["body"] + item["paidBody"]
			else:
				article = item["body"]
			output = summary_output + title_output + article

			articlelist.append(output)

	week_linkstr = ""
	for article in articlelist:
		week_linkstr += article
			
	return week_linkstr	

def htmlgenerator():
	htmlstart = '''<!DOCTYPE HTML>
	<head>
	<meta charset="utf-8"/>

	<title>Weekendavisen</title>

	</head>
	<body>'''
	
	htmlend = '</body></html>'
	
	finalhtml = htmlstart + week_links + htmlend

	# Saves to disc
	with open("./index.html", "wt", encoding="utf8") as fout:
		fout.write(finalhtml)	
			
week_links = weekendavisen()
htmlgenerator()
Kategorier
blandet

Lidt internethistorie #2

En anden gammel hjemmeside, jeg har fundet frem, er https://helmstedt.dk/plague/.

Et skoleprojekt, vist nok fra 8. klasse ca. 1997/1998, som var øvelse til projektopgaven i 9.

Roterende kranier, frames og kildehenvisninger – den sidste er bedst:

Clara John D: Middelalderbyen.
Dybmose Børge og Frederiksen Knud: Middelalderen.
Tallerud Berndt: Den Sorte Død.
Thiedecke Johnny: Pokker, Pest og Piller.
Tuchman Barbara: Et fjernt spejl.
Internettet.

Kategorier
blandet

Lidt internethistorie #1

For over 20 år siden fandtes “Mortens kodeside”. Min allerførste hjemmeside. Lavet i Microsoft Frontpage. Med frames.

Nu er den online igen på https://helmstedt.dk/kodeside/

Sidst opdateret 7/9/1998.

Siden førte til mit første “rigtige” job som redaktionsassistent på Jubii. Jeg blev kontaktet af “funtown.dk” for et bannerbyt. Det gjorde vi, og mit 14-årige selv spurgte, om ikke jeg kunne anmelde spil for dem. Det kunne jeg, og kort tid efter blev “funtown.dk” købt af Jubii og blev til Jubii Games.

Jubii havde hovedkvarter siloen på Rahbeks Allé, tæt på hvor jeg boede i Valby, så jeg blev ansat af Martin Thorborg og læste korrektur og fyldte indhold i et CMS – det var ret nyt dengang, sådan noget. Jeg oversatte også vBulletin til dansk til Jubii’s debatside – det skulle gøres direkte i koden.

Jeg kan huske, jeg bad om 50 kr. i timen – Martin synes, jeg skulle have 60.

God fornøjelse med Mortens kodeside.