June 11th 2012. 10am-12.45pm EST
People: Ahmd + Pablo
Check the live notes of the meeting at http://brownbag.me:9001/p/120611pageonex
Presentations
Ahmd @AhmdRefat
Studies, Cairo, elections
Work at Pageonex and studies
- Laila from Egypt http://r-shief.org/ @vj_um_amel http://twitterminer.r-shief.org/hq/
Pablo @numeroteca
http://civic.mit.edu
Paper with sahsa montera34.com/personal/pablo/120313_FrontpagevsTwitter.pdf
Work flow:
- Development Blog http://montera34.org/pageonex create categories
- Calendar: fix meeting Mondays and Thursdays. Everyday meetings 5pm (11am EST)
- Googled Docs https://docs.google.com/document/d/18F9SSEGU4fVcoXdK2b4Qwp0hpLQixqNs5M8tK0MVL3E/edit
- Etherpads: we’ll use the for the meetings
- Github: repository
- list serve? not yet
Technical
Domain.
pageonex.com (is redirected to numeroteca.org) and pageonex.org. I’ve bought them in http://gandi.net
Server space
Hard disk space: make calculation about what we need, stimate aprox size we need.
Gandi.net 1 share provides 3 GB.
1 standar visualization: Normal 30 days * 6 newspapers = 180 images. 180 * 500 = 90.000 KB = 87MB
- Images png, jpg (each is 500KB),
- 3 sizes full size, medium format (screen) and thumbnail
- Different formats of front pages, different sizes on screeen:
Traffic
Not yet.
Libraries for Ruby
Images handling
- Carrierwave – https://github.com/jnicklas/carrierwave
- Paperclip – https://github.com/thoughtbot/paperclip
Visualization
- D3 Data-Driven Documents – http://d3js.org/
- Dojo – http://dojotoolkit.org/
- Ext GWT http://www.sencha.com/
UI Design
[UI draft https://docs.google.com/presentation/d/1C0XMk14KMNINQFrAnkr6eGFVVSD-2xXuy7lwKRs7jL8/edit]
Bootstrap – http://twitter.github.com/bootstrap/
Mediameter – https://github.com/c4fcm/MediaMeter-Coder
Mediameter
Ahmd: It’s not similar: we are not going to use most of it. Models created, but not all implemented or we are not going to use them https://github.com/c4fcm/MediaMeter-Coder/tree/master/app/models It’ll take more time to start from it. Nathan could build an API for use to use it. Issues about delating the project for using it. Looking for midterm. Prefer to start from scratch: it’d faster.
The web app is the architecture. Ruby: DB, controllers (for every action: creating user, creating threads, creating fornt pages, scrapping from any Media, for UI, for handeling images, displaying).
Pablo: Both projects share the idea: coding articles in news. We should try to work together. I’ll connect with Sasha and Nathan to get feedback on this. My feeling is that is always better to work on existing tools, but also understand the will of “starting from clean” form Ahmd. We should tae a decision soon. About Mediameter: Is in github the last version of the code? it seems that it hasn’s been update recently.
Tasks, step by step
- Scrapping
- Storing
- Analyze / coding
- Display / data vis
- UI Desing
- Legal issues
1. Scrapping
As wide approach as possible.
Ruby scrappers:
-Start with http://en.kiosko.net/
-Newseum http://www.newseum.org/todaysfrontpages/ Difficult code to scrap. Example
In Egypt! http://www.newseum.org/todaysfrontpages/hr.asp?fpVname=EGY_ALT&ref_pge=gal&b_pge=1
Consult with them by email.
-Other newspapers: Check for them! make research at
- nytimes.com (US)
Today’s front page http://www.nytimes.com/pages/todayspaper/index.html
Any day: http://www.nytimes.com/images/2012/06/10/nytfrontpage/scan.jpg starting at January 24, 2002 http://www.nytimes.com/images/2002/01/24/nytfrontpage/scan.jpg
- elpais.com (Spain)
Online (html) front page newspaper data base http://elpais.com/hemeroteca/elpais/2012/06/01/m/portada.html
and other major newspapers
Spain
Egypt
Mexico
US
2. Storing / Data base
DB tables:
User: Id, user-name, email, password, thread
Thread: Id, Thread-name, start-date, end-date, newspaper(s)
Image: Id, type, newspaper_id, date, size
Newspaper: Id, newspaper name, country, city
Highlighted areas: image-id, tag(s), user
Area: area-id, X1, Y1, X2, Y2, highlighted_area_id
–
Twitter: for later. ToDo. Not yet.
Newspaper: source? wondering about the scalability of the system. Thinking in other sources: magazine, blogs…
3. Analyze / code
Questions: Non rectangular news: http://img.kiosko.net/2012/06/11/us/newyork_times.750.jpg how to id?
Possible solution:
-Multiple area selection
-Select and remove a part of the selection after selecting the main article
For area selection: http://deepliquid.com/content/Jcrop.html or http://odyniec.net/projects/imgareaselect/ Not using pixels, but coordinates.
Low resolution grid, for later. To facilitate intercoder reliability.
4. Display data vis
Check the gigapan, good to navigate.
Html,
svg interactive,…
How to handle huge amount of thumbnails: one matrix picture…
Libraries…
5. UI
–discussed before–
6. Legal issues
Ask Berkman Center
Center for Civic Media
Newseum
Kiosko
———
Ahmd main work: scrappers and model building.