For our Expert Q&A Thursday, November 15 we had Virginia James and Mark Raadgever from the Trove team join us and answer questions on how to get even more from Trove. Thanks again to Virginia and Mark for giving us all their time and insights. We look forward to having the Trove team back again soon.
Please find the transcript of the Q&A and links below.
When: NSW - ACT - VIC - TAS: 8:30-9:30pm AEDT | QLD: 7:30-8:30pm | WA: 5:30-6:30pm | NT: 7:00-8:00pm | SA: 8:00-9:00pm
As of November 2012 there are 316,919,556 pages consisting of 77,907,016 articles available to search in just the old newspapers on
Trove alone! Please find the transcript of the Q&A and links.
Summary of links from the Q&A:
=====================================================
Transcript of Expert Q&A - Getting even more from Trove:
Our Expert Q&A with Virginia James and Mark Raadgever from the
Trove team starts in 15 minutes at 8:30pm AEDT. Topic: Getting even more from Trove. Please ask your questions in a comment below and Virginia or Mark will answer in a following comment.
Comment: IHM: Welcome everyone, the Q&A is now open. Thanks for joining us tonight Virginia and Mark. Please ask your questions.
Q. From Christine: Hi there, my research turned up a story, reported in an article in a WA newspaper, called 'peeps at people' the story was about a Gallipoli soldier who eventually ended up as a guide at Galipolii, having had a bad time, prisoner of war etc, I would really like to know who he was.
A. Virginia: @Christine, Wow, what a fascinating story. I take it no names were mentioned.
Q (b): From Christine: there is a photo in Bart Ziinos book of the Irwin family on a pilgrimage to Gallipoli showing them doing a grave rubbing, cannot find the image in Trove or eleswhere, ideas?
A. Virginia: @Christine, Hmmm... difficult one. If the image hasn't been contributed separately to Trove apart from the book you won't be able to find it in Trove. So it may not exist as a digitised image on its own. Even if it is, it may have a different title from the caption in the book so it might be in Trove but need different search terms to find it.
Q. From Alexandra: Hi there...I would like to get more involved in the Trove community...I think I understand what the text corrections are about ...but what are the images and the lists? Also is it possible to set up an automated search in Trove - a bit like Google?
A. Mark: @Alexandra In answer to your three questions 1) Images are photographs (or scans) that are added by individuals to the Trove: Australia in Pictures Flickr group (
http://www.flickr.com/groups/pictureaustralia_ppe/), images uploaded to this group appear in Trove after about 2 weeks, as long as they meet certain requirements. 2) Lists are a list of items found in Trove that a user has collated to make it easier for people to find items that are related, for example
http://trove.nla.gov.au/list?id=9823 is a list of newspaper articles about Harold Williams the singer. These can be created by anyone, and are particularly useful for sharing relevant items with other people. 3) You can't set up an automated search in the same way that Google does, however you can subscribe to an RSS feed that notifies you when any new items are added to Trove which match your search. The link to this appears at the bottom of your search results. This feed can be set to just results in a specific zone, or for results across all zones.
A. Alexandra: Mark - thank you so much - that's great news....I forgot to say thank you to both of you and the whole NLA team ....Trove is so fantastic ...I bore people witless raving about it....
A. Alexandra: Oooh and I've just discovered your Trove search box which I've just put on my blog...that's useful too...thanks.
A. Chris: I have it also, and every now and then someone tells me how handy it is.
Q. From Rosemary: Often multiple lines of an article (eg those in the Family Notices) are missed in the OCR process - is there any way that the OCR can be redone or is the only option to manually correct the entries ourselves?
A. Virginia: @Rosemary, Unfortunately no. We don't have the resources available to send items back for re-scanning so that's why we love our volunteer text correctors so much!
A. Rosemary: The association of the text with the correct line on the page is generally very good but I have seen some where it is completely unrelated - I have to find the entry by looking for the name of interest. Fortunately not common and I can live with that given all the interesting facts I'm learning about my families.
A. Mark: @Rosemary The information on the lines is based on the output of the OCR engine, which provides each line and word on the page a set of co-ordinates (like a map), if there is an error in the information supplied by the OCR engine, then this may result in the images not quite matching up. Also, if the match is a 'fuzzy' match, these are often not highlighted in either the article text or the article image, you can use your browser 'find' feature to find the items in the text. Also, if a user has changed the text without respecting the original lines (this is permissible if there are not enough lines in the OCR text compared to the original article, however the start of the lines should match where the line is highlighted on the page), this disassociation can also occur. However, the exact reason for any specific article is difficult to say without viewing the article itself
Q. From Fiona: Hi I adore Trove - it's a fabulous resource for researching my family tree. I'm wondering why there isn't much post WW2? Thanks
A. Virginia: @Fiona, that would be because we can only digitise up to 1954 because of copyright issues so that's only 9 years post WWII to digitise.
Q. From Chris: Hi, I'm an avid user of Trove but I need some guidance that you might be able to help with. When I google a name, I can just write it as 'john goopy' + Queensland and for the most part, that's what I will get ...with Mr. Goopy and J Goopy but not much else. When I do this with Trove, I get all manner of Goopys... the bane of my life, don't laugh Cass... How can I get more direct reposnes?
A. Chris: Or responses :-)
A. Mark: @Chris: to reduce the number of results you get in Trove, you may wish to do a phrase search - put the search terms in quote marks (e.g. "John Goopy") to find only those results where the two words appear one after the other, otherwise, you may like to try a 'near' search with an honorific (e.g. Mr) to find only those results which mention a Mr Goopy - for example "Mr Goopy"~2, will find all articles where Mr and Goopy appear within 2 words of each other... You may also need to use the fulltext: syntax to drop some of the less relevant results - see
http://trove.nla.gov.au/newspaper/result?q=fulltext%3A%28%22mr+john+goopy%22~2%29 for an example
A. Chris: Thank you, a simple change to what I have been trying certainly helps. Wish I'd asked before. Love OCR... it's often more interesting than the initial reports, even more than my typos. It's good that we have so many correcting though, something I do each time I search, even if it's only a few lines. Thank you for your time.
A. Virginia: @Chris, LOL, I think you might be the first person to say the OCR is sometimes more interesting than the original! And thank you so much for your corrections! Every single one counts!
A. Rosemary: I often get a laugh too from the incorrect OCR - things like the "thin" son instead of the "third" son; lots of interested variations on "passed" in "passed away" too that you would never think of.
A. Mark: @Rosemary I can probably believe almost anything from what I've seen in the OCR!
Q. From Michelle: Is there a possibility of correcting text in a block (like in the Text box) rather than lines separately?
A. Mark: @Michelle This is not something that we intend to introduce at this point in time. The text corrections are completed line-by-line as each line of text is linked to a specific location on the image, and it is not currently possible for us to match these locations with a basic text block rather than the line-by-line system.
A. Mark: This is what allows us to highlight the search results both in the article text and in the original newspaper image
A. Michelle: Thanks @Mark- was just a bit of wishful thinking.
Q. From IHM: Virginia, is there a way to search specifically for images in a newspaper?
A. Virginia: Sure is! Using the Advanced search screen (
http://trove.nla.gov.au/ndp/del/search?adv=y) you have the option to search in captions only, and to restrict your search to illustrated articles only. The restriction on this is that advertising does not have the illustrations identifed, so this can't be used to find display advertising. You may also use the 'illustrated' facet to restrict your results to illustrated articles, then select a type of illustration.
Q. From Alexandra: Mark/Virginia - how do you choose which newspapers to digitise next?
A. Mark: @Alexandra This process is actually completed by the Newspapers Digitisation Program team (who do an excellent job with the complexities of the digitisation), in consultation with state and territory libraries - more information can be found on the ANDP website at
http://www.nla.gov.au/content/selection-policy
Q. From Alison: Do they have plans to collate all the Digger photos under a single catalogue item to make for easier searching. I know newspapers in Brisbane and other port cities often did one page or 2 page spreads of Expeditionary Forces prior to embarkation, using studio portraits, with captions underneath cropped photos just showing the men's faces. I did some text corrections to some Brisbane issues last year, but having trouble now doing an all states search for them for someone else.
A. Virginia: @Alison's question: No, no we don't, but... *big smile* We'd love to have someone create a list of the Digger photos that would make it a single searchable item. So, Alison, there's a challenge for you!
Q. From Kathy: When I ask for an email to be sent to me when a page is finished being digitised, the time frame given is up to 1 month. Does this need to be revised at all? I am sure I've been waiting longer for notification for some pages.
A. Mark: @Kathy One month is a 'normal' timeframe if the batch that contains the article passes the final quality check, then this should normally be completed within 28 days, if the batch fails the quality check then it can take a lot longer (up to 28 days every time it is resubmitted, the most I've seen is a batch that had to be resubmitted 5 times). If the batch fails the check after you submit your email address, then you will receive notification when the page is available, if it passes, you will get notification that the specific article is available.
Comment: Chloe: I haven't got a specific question but I'm reading through the comments, learning new things from all the other questions. Thanks!
A. Mark: @Chloe Glad to hear that you're getting some tips and ideas from the questions.
A. Virginia: @Chloe, not ignoring you! Are you still getting good stuff from this? If you have any questions you'd like us to answer that you don't want to raise here, just shoot us a message via the
Contact Us form.
Q. From Virginia: So... just wondering... does anyone use the rest of Trove? I think the other zones are feeling neglected. ;)
A. Virginia: I only ask because we often have pictures of service men and various ships that may interest family historians.
A. Alexandra: Oh no, don't you worry Virginia, I use Trove for finding books at the public library where I work..we love it...it's the tool we use the most after our own catalogue I think it is fair to say....and I recommend it to all our patrons...I particularly love it for images too...
A. Alexandra: If I have any problems it's usually with the archived websites....I click on the links but I never seem to get anywhere with them...I'm not a very experienced user obviously....
A. IHM: We're always pointing people to books & libraries via Trove and don't tell anyone but we use Trove to search other libraries catalogues instead of their systems... And of course there's all the pictures we find on Trove as well!
A. Wendy: I always do a 'complete' search first cause you never know what else is there :) I also redo the same search occassionaly to catch new stuff ... found a beautiful photo just last week of my great grandfather with convict leg irons he found on a skeleton , the photo is held by Tas State library . It wasn't there the time before , so has been added recently. My maiden name is Frerk, so that make things easier ;)
A. Mark: @Alexandra The archived websites can be a bit interesting to use, the are a snapshot of the website at a particular point in time, so many of the links on the pages will be broken. If you have a specific problem, please use the contact us form so that we can provide more specific results.
Q. From Carmel: I heard the other day where some have tried to fix the text and completely changed the story so their ancestor does not sound bad! this does not change the facts at all but corrections are checked before being authorised is this correct? I never thought of this and think the idea funny, luckily the original article cannot be changed anyway. I usually read the original and only revert to the ocr if I am having trouble reading it as well.
A. Virginia: @Carmel, Ack! If you see this sort of thing you should contact us via the contact us form and we'll look into it! We've been pretty lucky with not much 'vandalism' and yes it's true the original can't be changed, but still!
A. Virginia: But no, we do not check the text before the corrections are authorised.
A. Mark: In addition to Virginia's comment, the corrections are available immediately that the user clicks 'save', and they are normally searchable within the hour. However, to help protect against that type of behaviour, the original OCR text always remains searchable (Trove searches the original OCR and the most recent update)
A. Carmel: have not come across it myself but someone did remark that some had tried. Why get in the way of a good story and I would be recorrecting if it was one of mine.
A. Mark: @Carmel There have been a few occurrences where similar things have happened, though not in the way you've mentioned, so it does happen.
Q. From Tony: Hi Guys, when I use the advanced search or a basic search and then the qualifiers on the left side I often want to limit my search to articles 100-1000 words and <1000 words. This requires two searches or one without limiting article length and full of unwanted items. Is there any way to incorporate a search button for both?
A. Virginia: @Tony, give me a sec to digest that one. So you basically want to limit to results that are between 100 to 1000 words and then results less than 1000 words? Isn't that the same thing?
A. Virginia: Oh! Sorry, did you mean greater than 1000 words?
A. Tony: Hi. No that was 100-1000 and over 1000 words... so that would be >1000. Apoloies
A. Tony: Thats Apologies
A. Virginia: @Tony, actually that's what I came round to and there is a way, but we can best answer it if you put in a query via the
Contact Us form.
A. Tony: I then use my search parameter in Trove tools to data mine a set series of articles. So having two seperate searches of the same topic is a little disorganised
A. Tony: BTW Cass, I submitted my PhD last Thursday...sitting here working on the oral defence for next Thursday.
A. Virginia: @Tony, do you mean you're using the Trove API?
A. Tony: Thanks for the reply Virginia. I will contact you via query page. I have used Trove's API but I now use Wragge Trove Tools
Q. From IHM: Mark. Is it better to sort by relevance or by date?
A. Mark: Inside History Magazine - In almost all circumstances it is better to sort by relevance rather than by date. The main reason for this is that Trove, by default, does a 'fuzzy' search, which will introduce non-exact matches into your search results. Also, when looking at newspapers, the article category does influence the considered relevance of an article. The combination of these two means that when sorting by date, the most relevant results may not be visible due to the non-exact matches and advertising matches pushing them out of the way. Where sorting by date is best is when you are specifically looking for the earliest or latest mention of an item. Sorting by date can also be useful if you are using the fulltext: or text: syntax to turn off the fuzzy searching. If you are looking for results in a specific date range, then the best option is to use the date facet or the date search on the advanced search screen rather than trying to sort by date.
A. Tony: And Mark, Trove searches on every word..subject to OCR quality, while many others like Papers Past only search on headlines. Cudos
A. Mark: @Tony very, very true
Comment: Martyn: The ability to text search newspapers has opened up such a wonderfully rich resource. Thank you so much to all concerned!
Q. From Michelle: I have used a lot of digitised newspapers online from other countries and Trove is by far the most skillful research tool on offer in Australia. It is changing the way we present history and the best thing is- it is free. What do we have to look forward to over the next few years from Trove?
A. Virginia: @ Martyn and Michelle, Thank you so much for your comments! As for the future... we'll always be adding content, including more newspapers and hopefully improving things like Lists and things in the newspapers zone and generally tackling all the enhancements people have asked for!
Comment: Carmel: I really like the facility where new articles that have not been checked can be ordered to be notified when it is available, only problem is I was sent two pages last week and I cannot remember who I was researching! need to keep a log
A. Mark: @Carmel that is always a risk with those notifications, we actually do have a suggestion in on how to improve it...either email the user immediately to say they've requested notification, with the article details (title, newspaper, page, issue date)
A. Mark: or to try and include more information in the notification email.
A. Carmel: I know Trove has bought many of my ancestors to life, there are things in papers that I would not have found if not for Trove unless I had at least one more lifetime to sit and read every paper from start to end.
A. Mark: @Carmel That is one of the things we believe Trove has been most useful for, as it does allow that access which was not previously available
Q. From IHM: Virginia / Mark: What’s the most exciting thing happening in digital history for you?
A. Mark: Inside History Magazine - In terms of digital history, I think that the most exciting thing that is happening in this field is that it is become much more mainstream, and people are starting to realise that you can do historical research using the online tools, and that these online tools provide significantly more ability to find and analyse the resources that are available than traditional methods do
A. Virginia: For me... in terms of digital history is how Trove is inspiring researchers. We'll soon be presenting a Trove seminar featuring speakers for whom Trove has inspired their research, such as Science in the Australian Women's Weekly and others!
Q. From Alexandra: Just googled Wragge Trove Tools...and I think that's the rest of the night gone for me....why didn't I know about this? I was going to ask about APIs Virginia/Mark...are there any others out there??
A. Virginia: Sorry, that was a short answer, but those are the only ones I can think of off the top of my head at 9:30pm.
A. Alexandra: Thanks for that...sometimes I feel so very ignorant....
A. IHM: Here's the link to Wragge's [Tim Sherritt] Trove tools ::
http://discontents.com.au and the transcript to our Q&A with him ::
http://ow.ly/fj9Mm | Wragge also wrote a story for us in our Issue 12 about his toolkit and there's follow up coming early in 2013!
A. Virginia: @Alexandra, Don't feel that way, the digital world moves faster than we can keep up sometimes!
A. Carmel: We are all ignorant in many areas but if we do not ask the right questions will remain ignorant. As they say there is not such thing as a stupid question as if we do not know the answer we need to ask the questions. I guess in a way the ignorant are the ones who do not want to know the answers.
A. Tony: So true Virginia. When I started my thesis I was using microfische to access regional newspapers to confirm details...now I sit at home, structure a search and presto. The expectation for empirical evidence in PhDs will skyrocket due to the digital explosion
Q. From Wendy: What is API ?
A. Virginia: @Wendy, it stands for Application Programming Interface and just basically means a way of machine talking to machine.
Q. From Alexandra: So really Virginia when someone comes into the public library where I work and wants to donate photos the best thing would be to suggest that they upload them to Trove - as long as they own the copyright of course, yes?
A. Alexandra: And I'm talkiing about photos with historical significance of course...not just photos of my guinea pig.
A. Virginia: @Alexandra, Absolutely! What we suggest is that they scan them into digital format and then create a Flickr account if they haven't already got one and then upload the images with good titles and descriptions and tags to our Trove: Australia in Pictures group on Flickr.
Comment: IHM: Thanks again to Virginia and Mark for joining us tonight! That was an excellent Q&A, as they always are with Trove! We’ll publish the questions, answers and links from tonight’s session in a blog post very soon.
A. Alexandra: Thanks guys...really enjoyed it!
A. Mark: Thanks all, it is always a pleasure to answer the questions - if you have any further questions, please use our contact form -
http://trove.nla.gov.au/contact
A. Virginia: Thanks! We do love doing these sessions.
A. Wendy: Thank you every one , love these sessions
A. Michelle: I know the session is over but just think back a few years ago those of you who were researching- trolling through newspapers in hardcopy or on film spending hours looking for snippets then compare our research today where we can do such sophisticated searches in our PJs in front of the TV or on our devices coming home in the train. I can remember spending days searching the SMH for references to the 1867 flood at Windsor for mybook and can now do the same, with heaps more results in mere seconds
A. Katrina: Well, I've been using Trove for a couple of years, and I didn't understand any of the above!
A. IHM: Trove, crowdsourcing & digital history - it's changing the way we research and write history, and so much for the better Michelle. I'd like to read that book of yours one day.
Let us know if you have any follow up questions Katrina but we'll have Trove back again soon!
Next Week: Who's joining us for next Thursday's Expert Q&A?
Vicki Dawson & Rosemary Kopittke from
findmypast.com.au Aust & NZ. Find out how to get the most from findmypast
When: NSW-ACT-VIC-TAS: 8:30-9:30pm AEDT | QLD: 7:30-8:30pm | WA: 5:30-6:30pm | NT: 7:00-8:00pm | SA: 8:00-9:00pm.
See you next Thursday, November 22 for more on findmypast.
=====================================================
Read the previous Expert Q&A transcripts:
[9] Thursday, October 4 :: Studying and doing research at UNE