This project page concerns the acquirement of The Juilliard Manuscript Collection.
Contents |
Active: Funper
If you want to be a part of the project or think you can help by providing expertise, information or other knowledge, contact user above.
In order to rip the JPEG files that make up a page in a score, you need the absolute path to files that are on the highest zoom level (i.e. maximum resolution).
With the help of Wireshark, the formula of these addresses has been figured out as follows:
without brackets, where:
Now it is easy to figure out the path to the JPEG files of a particular page. If someone wants to the rip e.g. Beethoven's manuscript of the Ninth Symhpony, they could do as follows:
Thus the path of these particular JPEG files would be:
There is one known way of obtaining these images. There might be others but this is the only one known right now.
Make a .txt. You are going to write ALL possible addresses in this file, so start with N = 0 < 2, X < 25 and Y < 25. Your .txt file should contain approximately 1950 entries like this:
In windows, there is a little tool for helping doing url list like this,
I use "Extreme Url Generator" , It is Not Free, but you can add many Variations
in the url address,
In this case , We need add 3 Variations for N , X & Y.
Now you need to feed this .txt file to a website ripper or download manager. I used HTTrack Website Copier. Run the program and make it try to download all files in the list. Now, not all of these are valid links, so try it some few times one different pages and see how many it uses. Optionally, you can weed out unused links, since it takes a lot of time for the program to check such a large amount of them (approximately two thirds of these are useless, depending on the page size).
When you have downloaded the first page, you can replace all the PAGE NAME strings in the .txt file (BEET_ODEJ_1st_movement_p000a) with the next one (i.e. BEET_ODEJ_1st_movement_p000b).
Repeat the procedure until all pages are downloaded.
Use a program that supports panorama processing. I use IrfanView. You could either do this manually, or through the command promt or through scripts. Since the work takes a lot of time to do manually, and since I do not have enough (or any) script skills, I used the command promt to do this relatively quickly.
I used this with IrfanView:
Change the folder name and place it in a convenient place, e.g. if it's the first page then you could place it in D:\1. Open up notepad and write the following:
move D:\1\TileGroup0\*.* D:\1
move D:\1\TileGroup1\*.* D:\1
move D:\1\TileGroup2\*.* D:\1
"C:\Program Files\IrfanView\i_view32.exe" /panorama=(2,D:\1\5-0-0.jpg,D:\1\5-0-1.jpg,D:\1\5-0-2.jpg,D:\1\5-0-3.jpg,D:\1\5-0-4.jpg,D:\1\5-0-5.jpg,D:\1\5-0-6.jpg,D:\1\5-0-7.jpg,D:\1\5-0-8.jpg,D:\1\5-0-9.jpg,D:\1\5-0-10.jpg,D:\1\5-0-11.jpg,D:\1\5-0-12.jpg,D:\1\5-0-13.jpg,D:\1\5-0-14.jpg,D:\1\5-0-15.jpg,D:\1\5-0-16.jpg,D:\1\5-0-17.jpg,D:\1\5-0-18.jpg,D:\1\5-0-19.jpg,D:\1\5-0-20.jpg,D:\1\5-0-21.jpg,D:\1\5-0-22.jpg,D:\1\5-0-23.jpg,D:\1\5-0-24.jpg) /tifc=0 /convert=C:\1\0.tif /silent
If run in the command prompt, this will move all files to the same directory and combine them in a vertical picture (C:\1\1.tif), which is only a part of the full page. Continue this with the rest of the pictures, e.g. 5-1-0.jpg, 5-1-1.jpg ... etc.
When you're done, run this in the command prompt:
"C:\Program Files\IrfanView\i_view32.exe" /panorama=(1,D:\1\0.tif,D:\1\1.tif,D:\1\2.tif,D:\1\3.tif,D:\1\4.tif,D:\1\5.tif,D:\1\6.tif,D:\1\7.tif,D:\1\8.tif,D:\1\9.tif,D:\1\10.tif,D:\1\11.tif,D:\1\12.tif,D:\1\13.tif,D:\1\14.tif,D:\1\15.tif,D:\1\16.tif,D:\1\17.tif,D:\1\18.tif,D:\1\19.tif,D:\1\20.tif,D:\1\21.tif,D:\1\22.tif,D:\1\23.tif,D:\1\24.tif,D:\1\25.tif) /tifc=0 /convert=D:\page1.tif /silent
This will combine the vertical pictures horizontally. The resulting file (page1.tif) will be the complete page.
Batch convert all files in Irfanview, changing the dpi to 108, in order to make the zoom function correct in the PDF. Then combine all files with you favorite PDF program into a PDF. The resulting PDF file is around 1.5 megabytes / page.
Continue with the rest of the manuscript.
I propose another way to do this job, Using "Contact sheet II" in Photoshop.
1. You have to fix width and height for all little Jpg Files,
in this case is 256 pixel * 256 Pixal. So choose all files for 5-*-16 & 5-23-* , use batch function fixing them as 256*256 Pixals(Left*Upper)
2. Rename them all for 001~408.jpg, I use Canon Digital photo professional-rename tool. You can do it in photoshop too
3. Use Photoshop autofunction - Contact sheet.
built colume24* Raw17 , Place "down first" Width 6144 , Height 44352 , resolution 100, "Auto Spceing" cancelled , press ok .Done!
When combining the pictures, keep the ordering of the pages. If the ordering is confusing or unnecessary in one way or another, post something on the project talk page about it and we will discuss it.
To facilitate tracking of submitted files, please add some future template to the "Misc. Notes" field of the file entry. Also, please mention Juilliard School as the scanner by adding some template to the "Scanner" field.
I have written a small tutorial on how to do this exact thing. Maybe it will help you out. Generoso 19:34, 24 June 2011 (UTC)