OAI-PMH import with YaCy:

YaCy can import Dublin Core metadata from OAI-PMH data sources. A search portal for over 2000 OAI sources is available at http://oai.yacy.net, hosted by the Karlsruhe Institute for Technology, Liebel Lab on a single four-core server with 8 GB RAM. You can try a search right here:

OAI Book Search

The search result output formatting might not be perfect by now, but this is a community attempt to create a OAI search portal software for everyone. You can easily create and 'own' such a OAIster-like search portal on a home pc doing the following steps:
  • download YaCy 0.95 from http://yacy.net
  • extract the tar.gz and start the start script in the release directory for your operation system (startYACY.sh for linux, startYACY.command for Mac, startYACY.bat for Windows)
  • a web server is started and available at http://localhost:8080. Open that link now
  • you can do some configuration stuff now, you can also skip that for later (see below). To create directly a OAI-PMH export and web index, click the link 'Content Import' on the left menu.
  • click now on Import OAI-PMH Sources
  • on the lower part of the page you see a list with the headline "List of 2104 OAI-PMH Servers". Select some of the sources with the button on the left of the source or all using the button on the left of the "Source" headline of the table. Don't use the 'all' button just for fun; this needs at least 3 GB assigned to YaCy (only available for 64bit OS) and about 40GB of disc space; the import then takes about 24 hours on a 4-core server. See 'enhancements' below if you want to do this.
  • click on the button "Load Selected Sources". The import starts; if you selected all oai sources please wait about 10 seconds until a response appears. Now you see on the lower part of the page a list of sources and a marker saying 'loading' or 'finished'. On the right of the table there are colums where the number of processed chunks and records are counted.
  • to see the records that hat been imported, click on "Crawl Results" on the left menu. Then click on Crawl Results on the top menu. You see a list of all domains where the import comes from together with the number of records and a delet button that you can press to remove the recently imported records. Scroll down, at the end of the domain list you see the complete list of all imported records"
  • to search, click on "Search Page" on the left menu. Submit a search. On the result page you get a G**gle-like result output and on the right side navigation menus for domains and authors. To see the author navigation details, click on the "Author Navigation" bar.

Enhancements: Personal Settings

Without further configuration YaCy acts in the context of a Peer-to-Peer Search Engine, that means it instantly starts to share the local index with other peers, and other peers shares (sends) their index with your peer. In case that you do not want to 'mess' your index with others, do the following:

  • click on "Peer Administration" if you still see the search page
  • click on "Admin Console"
  • click on "Basic Configuration". Here you can set the interface language to german, set a peer name and/or change the server port
  • click on "Network Configuration". If you do not want to share your index with other peers, click on "Robinson Mode". Now your peer does not recieve other indexes and does not sends your index to other. There is one exception: when other peers search, they get also search results from you. If you do not want that, Click on "Private Peer".

Enhancements: Configuration to import all 2000 OAI-PMH data sources:

For an import of all 2000 OAI-Servers you need about 3000 MByte RAM and also about 40 GByte of disc space. YaCy has a lower default configuration. To increase memory usage, do the following:

  • click on "Performance". In "Memory reserved for JVM" you can set the memory amount that you want to spend for your peer. In case that you want to import all sources from all 2000 OAI-PMH servers you need about 3000 MByte of RAM which works only on a 64-bit enabled OS. If you have more RAM, assign more - this will increase speed. But not so much that your computer starts to swap, then performance will be terrible.
  • If you set a new memory amount, you must re-start YaCy. Click on "Re-Start" on the left menu.

Questions, Enhancement, Bugs

In case that there are any questions, new ideas or complaints about YaCy and the OAI-PMH importer functionality, please go to the YaCy forum at http://forum.yacy.de.