Presentations and hands on, Monday meeting

June 11th 2012. 10am-12.45pm EST
People: Ahmd + Pablo

Check the live notes of the meeting at http://brownbag.me:9001/p/120611pageonex

Presentations

Ahmd @AhmdRefat
Studies, Cairo, elections
Work at Pageonex and studies

Laila from Egypt http://r-shief.org/ @vj_um_amel http://twitterminer.r-shief.org/hq/

Pablo @numeroteca
http://civic.mit.edu
Paper with sahsa montera34.com/personal/pablo/120313_FrontpagevsTwitter.pdf

Work flow:

Development Blog http://montera34.org/pageonex create categories
Calendar: fix meeting Mondays and Thursdays. Everyday meetings 5pm (11am EST)
Googled Docs https://docs.google.com/document/d/18F9SSEGU4fVcoXdK2b4Qwp0hpLQixqNs5M8tK0MVL3E/edit
Etherpads: we’ll use the for the meetings
Github: repository
list serve? not yet

Technical

Domain.
pageonex.com (is redirected to numeroteca.org) and pageonex.org. I’ve bought them in http://gandi.net

Server space
Hard disk space: make calculation about what we need, stimate aprox size we need.
Gandi.net 1 share provides 3 GB.

1 standar visualization: Normal 30 days * 6 newspapers = 180 images. 180 * 500 = 90.000 KB = 87MB

Images png, jpg (each is 500KB),

3 sizes full size, medium format (screen) and thumbnail

Different formats of front pages, different sizes on screeen:

Lemonde: http://img.kiosko.net/2012/06/11/fr/lemonde.750.jpg

Nytimes: http://img.kiosko.net/2012/06/11/us/newyork_times.750.jpg

Traffic
Not yet.

Libraries for Ruby

Images handling

Carrierwave – https://github.com/jnicklas/carrierwave
Paperclip – https://github.com/thoughtbot/paperclip

Visualization

D3 Data-Driven Documents – http://d3js.org/
Dojo – http://dojotoolkit.org/
Ext GWT http://www.sencha.com/

UI Design
[UI draft https://docs.google.com/presentation/d/1C0XMk14KMNINQFrAnkr6eGFVVSD-2xXuy7lwKRs7jL8/edit]
Bootstrap – http://twitter.github.com/bootstrap/
Mediameter – https://github.com/c4fcm/MediaMeter-Coder

Mediameter
Ahmd: It’s not similar: we are not going to use most of it. Models created, but not all implemented or we are not going to use them https://github.com/c4fcm/MediaMeter-Coder/tree/master/app/models It’ll take more time to start from it. Nathan could build an API for use to use it. Issues about delating the project for using it. Looking for midterm. Prefer to start from scratch: it’d faster.
The web app is the architecture. Ruby: DB, controllers (for every action: creating user, creating threads, creating fornt pages, scrapping from any Media, for UI, for handeling images, displaying).

Pablo: Both projects share the idea: coding articles in news. We should try to work together. I’ll connect with Sasha and Nathan to get feedback on this. My feeling is that is always better to work on existing tools, but also understand the will of “starting from clean” form Ahmd. We should tae a decision soon. About Mediameter: Is in github the last version of the code? it seems that it hasn’s been update recently.

Tasks, step by step

Scrapping
Storing
Analyze / coding
Display / data vis
UI Desing
Legal issues

1. Scrapping
As wide approach as possible.

Ruby scrappers:
-Start with http://en.kiosko.net/
-Newseum http://www.newseum.org/todaysfrontpages/ Difficult code to scrap. Example
In Egypt! http://www.newseum.org/todaysfrontpages/hr.asp?fpVname=EGY_ALT&ref_pge=gal&b_pge=1

Consult with them by email.

-Other newspapers: Check for them! make research at

nytimes.com (US)
Today’s front page http://www.nytimes.com/pages/todayspaper/index.html
Any day: http://www.nytimes.com/images/2012/06/10/nytfrontpage/scan.jpg starting at January 24, 2002 http://www.nytimes.com/images/2002/01/24/nytfrontpage/scan.jpg

elpais.com (Spain)
Online (html) front page newspaper data base http://elpais.com/hemeroteca/elpais/2012/06/01/m/portada.html

and other major newspapers
Spain
Egypt
Mexico
US

2. Storing / Data base

DB tables:
User: Id, user-name, email, password, thread
Thread: Id, Thread-name, start-date, end-date, newspaper(s)
Image: Id, type, newspaper_id, date, size
Newspaper: Id, newspaper name, country, city
Highlighted areas: image-id, tag(s), user
Area: area-id, X1, Y1, X2, Y2, highlighted_area_id
–
Twitter: for later. ToDo. Not yet.

Newspaper: source? wondering about the scalability of the system. Thinking in other sources: magazine, blogs…

3. Analyze / code
Questions: Non rectangular news: http://img.kiosko.net/2012/06/11/us/newyork_times.750.jpg how to id?
Possible solution:
-Multiple area selection
-Select and remove a part of the selection after selecting the main article

For area selection: http://deepliquid.com/content/Jcrop.html or http://odyniec.net/projects/imgareaselect/ Not using pixels, but coordinates.

Low resolution grid, for later. To facilitate intercoder reliability.

4. Display data vis
Check the gigapan, good to navigate.

Html,
svg interactive,…
How to handle huge amount of thumbnails: one matrix picture…

Libraries…

5. UI
–discussed before–

6. Legal issues
Ask Berkman Center
Center for Civic Media

Newseum
Kiosko

———
Ahmd main work: scrappers and model building.

Presentations

Work flow:

Technical

Tasks, step by step

Leave a Reply Cancel reply