June 11th 2012. 10am-12.45pm EST
People: Ahmd + Pablo

Check the live notes of the meeting at http://brownbag.me:9001/p/120611pageonex


Ahmd @AhmdRefat
Studies, Cairo, elections
Work at Pageonex and studies

Pablo @numeroteca
Paper with sahsa montera34.com/personal/pablo/120313_FrontpagevsTwitter.pdf

Work flow:


pageonex.com (is redirected to numeroteca.org) and pageonex.org. I’ve bought them in http://gandi.net

Server space
Hard disk space: make calculation about what we need, stimate aprox size we need.
Gandi.net 1 share provides 3 GB.

1 standar visualization: Normal 30 days * 6 newspapers = 180 images. 180 * 500 = 90.000 KB = 87MB

  • Images png, jpg (each is 500KB),
  • 3 sizes full size, medium format (screen) and thumbnail
  • Different formats of front pages, different sizes on screeen:

Not yet.

Libraries for Ruby

Images handling


UI Design
[UI draft https://docs.google.com/presentation/d/1C0XMk14KMNINQFrAnkr6eGFVVSD-2xXuy7lwKRs7jL8/edit]
Bootstrap – http://twitter.github.com/bootstrap/
Mediameter – https://github.com/c4fcm/MediaMeter-Coder

Ahmd: It’s not similar: we are not going to use most of it. Models created, but not all implemented or we are not going to use them https://github.com/c4fcm/MediaMeter-Coder/tree/master/app/models It’ll take more time to start from it. Nathan could build an API for use to use it. Issues about delating the project for using it. Looking for midterm. Prefer to start from scratch: it’d faster.
The web app is the architecture. Ruby: DB, controllers (for every action: creating user, creating threads, creating fornt pages, scrapping from any Media, for UI, for handeling images, displaying).

Pablo: Both projects share the idea: coding articles in news. We should try to work together. I’ll connect with Sasha and Nathan to get feedback on this. My feeling is that is always better to work on existing tools, but also understand the will of “starting from clean” form Ahmd. We should tae a decision soon. About Mediameter: Is in github the last version of the code? it seems that it hasn’s been update recently.

Tasks, step by step

  1. Scrapping
  2. Storing
  3. Analyze / coding
  4. Display / data vis
  5. UI Desing
  6. Legal issues

1. Scrapping
As wide approach as possible.

Ruby scrappers:
-Start with http://en.kiosko.net/
-Newseum http://www.newseum.org/todaysfrontpages/ Difficult code to scrap. Example
In Egypt! http://www.newseum.org/todaysfrontpages/hr.asp?fpVname=EGY_ALT&ref_pge=gal&b_pge=1

Consult with them by email.

-Other newspapers: Check for them! make research at

and other major newspapers

2. Storing / Data base

DB tables:
User: Id, user-name, email, password, thread
Thread: Id, Thread-name, start-date, end-date, newspaper(s)
Image: Id, type, newspaper_id, date, size
Newspaper: Id, newspaper name, country, city
Highlighted areas: image-id, tag(s), user
Area: area-id, X1, Y1, X2, Y2, highlighted_area_id

Twitter: for later. ToDo. Not yet.

Newspaper: source? wondering about the scalability of the system. Thinking in other sources: magazine, blogs…

3. Analyze / code
Questions: Non rectangular news: http://img.kiosko.net/2012/06/11/us/newyork_times.750.jpg how to id?
Possible solution:
-Multiple area selection
-Select and remove a part of the selection after selecting the main article

For area selection: http://deepliquid.com/content/Jcrop.html or http://odyniec.net/projects/imgareaselect/ Not using pixels, but coordinates.

Low resolution grid, for later. To facilitate intercoder reliability.

4. Display data vis
Check the gigapan, good to navigate.

svg interactive,…
How to handle huge amount of thumbnails: one matrix picture…


5. UI
–discussed before–

6. Legal issues
Ask Berkman Center
Center for Civic Media


Ahmd main work: scrappers and model building.

Leave a reply

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>