laban.rsScrap → Scribd

SCRIBD

You can read but you can't download

Scribd announcement (from their home page):

“The World’s largest online library. Read, publish, and share documents and written works.”

Nice but… nowhere mentioned that you have to pay for each e-text or document. How about works that are in public domain? They are selling those too! Fortunately, base internet idea is accessibility (and adaptability and/or readability…) so I wrote my code to download free e-text for free. They made HTML e-texts incredibly dirty — me: clean; they made everything inaccessible — me: accessible. They made e-texts (almost) unreadable and unformatted — me: extremely easy to read and adapt with proper formatting.

And, at last but not least, largest online library is www.lib.ru (maintained by good friend of mine, Maxim Moshkow) where everything “open” is really open and free. Scribd could be maybe the largest scrapbook but not online library.

Sample

One sentence that should be marked-up like following:

<p>…po reč — ono što je danas izašlo u Državnim Novinama:"Kroz 120 dana završiće se izgradnja Integrala. Blizu je veliki, istorijski trenutak, kada će se prvi Integral vinuti u kosmi…</p>

Scribd marked-up on their special, incredible way:

<span class=a style="left:1679px;top:871px;word-spacing:1px;letter-spacing:-1px">po re</span></div><div class="ff1" style="font-size:59px"><span class=a style="left:1900px;top:869px">и</span></div><div class="ff0" style="font-size:71px"><span class=a style="left:1989px;top:871px;word-spacing:1px;letter-spacing:-1px">- ono љto je danas izaљlo u Drћavnim Novinama:</span><span class=a style="left:486px;top:954px;word-spacing:1px;letter-spacing:-1px">"Kroz 120 dana zavrљi</span></div><div class="ff1" style="font-size:59px"><span class=a style="left:1414px;top:953px">ж</span></div><div class="ff0" style="font-size:71px"><span class=a style="left:1458px;top:954px;word-spacing:1px;letter-spacing:-1px">e se izgradnja Integrala. Blizu je veliki, istorijski</span><span class=a style="left:486px;top:1038px;word-spacing:2px;letter-spacing:-1px">trenutak, kada</span></div><div class="ff1" style="font-size:59px"><span class=a style="left:1149px;top:1036px">ж</span></div><div class="ff0" style="font-size:71px"><span class=a style="left:1193px;top:1038px;word-spacing:1px;letter-spacing:-1px">e se prvi Integral vinuti u kosmi</span></div><div class="ff1" style="font-size:59px">

If you clean HTML tags (and that is very easy to be done), you'll get something as next:

po re

č

- ono što je danas izašlo u Državnim Novinama:"Kroz 120 dana završi

ć

e se izgradnja Integrala. Blizu je veliki, istorijskitrenutak, kada

ć

e se prvi Integral vinuti u kosmi

How much energy and knowledge had been used to convert accessible to inaccessible, readable to unreadable etc. No comment. But base Internet philosophy won again: you can't hide anything if you already decide to publish it online.

If you desperately need some e-text from there, and it is in public domain, write to me and I'll help you. Cheers.